Spaces:

mv63
/

BaseChange

Runtime error

Vedant Jigarbhai Mehta commited on Mar 25

Commit

b25c087

0 Parent(s):

Initial scaffolding for military base change detection project

Add complete project structure with 3 model architectures (Siamese CNN,
UNet++, ChangeFormer), dataset pipeline, training/evaluation/inference
scripts, Gradio demo, Colab setup, and config with all hyperparameters.

Files changed (19) hide show

.gitignore +42 -0
README.md +93 -0
app.py +193 -0
configs/config.yaml +143 -0
data/dataset.py +153 -0
data/download.py +132 -0
evaluate.py +135 -0
inference.py +176 -0
models/__init__.py +39 -0
models/changeformer.py +358 -0
models/siamese_cnn.py +85 -0
models/unet_pp.py +78 -0
requirements.txt +16 -0
setup_colab.py +172 -0
train.py +418 -0
utils/__init__.py +0 -0
utils/losses.py +139 -0
utils/metrics.py +226 -0
utils/visualization.py +141 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,42 @@

+# Checkpoints & model weights
+checkpoints/
+*.pth
+*.pt
+# Logs
+logs/
+# Outputs
+outputs/
+# Data
+raw_data/
+processed_data/
+# Python
+__pycache__/
+*.pyc
+*.pyo
+*.egg-info/
+dist/
+build/
+.eggs/
+# Environment
+.env
+.venv/
+venv/
+env/
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+# OS
+.DS_Store
+Thumbs.db
+# Jupyter
+.ipynb_checkpoints/

README.md ADDED Viewed

	@@ -0,0 +1,93 @@

+# Military Base Construction Monitoring — Change Detection
+Deep learning system for detecting new structures and infrastructure changes between satellite image pairs. Targets defense applications: military base expansion, runway construction, and infrastructure development monitoring.
+## Models
+| Model | Backbone | Role | Paper |
+|---|---|---|---|
+| Siamese CNN | ResNet18 (shared) | Baseline | — |
+| UNet++ | ResNet34 (shared) | Mid-tier | [arXiv:1807.10165](https://arxiv.org/abs/1807.10165) |
+| ChangeFormer | MiT-B1 (shared) | SOTA | [arXiv:2201.01293](https://arxiv.org/abs/2201.01293) |
+## Dataset
+**LEVIR-CD** — 637 image pairs at 1024×1024, cropped to 256×256 non-overlapping patches. Contains building change annotations across urban areas.
+## Quick Start (Google Colab)
+```python
+# 1. Setup
+from setup_colab import setup
+dirs = setup()
+# 2. Train
+!python train.py --config configs/config.yaml --model siamese_cnn
+# 3. Evaluate
+!python evaluate.py --config configs/config.yaml --checkpoint checkpoints/siamese_cnn_best.pth
+# 4. Resume after disconnect
+!python train.py --config configs/config.yaml --model changeformer \
+    --resume /content/drive/MyDrive/change-detection/checkpoints/changeformer_last.pth
+```
+## Local Usage
+```bash
+# Preprocess dataset
+python data/download.py --dataset levir-cd --raw_dir ./raw_data --out_dir ./processed_data
+# Train
+python train.py --config configs/config.yaml --model unet_pp
+# Evaluate
+python evaluate.py --config configs/config.yaml --checkpoint checkpoints/unet_pp_best.pth
+# Inference on new image pair
+python inference.py --before path/to/before.png --after path/to/after.png \
+    --model changeformer --checkpoint checkpoints/changeformer_best.pth
+# Gradio demo
+python app.py
+```
+## GPU Batch Sizes (Auto-Detected)
+| Model | T4 (16GB) | V100 (16GB) | LR |
+|---|---|---|---|
+| Siamese CNN | 16 | 16 | 1e-3 |
+| UNet++ | 8 | 12 | 1e-4 |
+| ChangeFormer | 4 | 6 | 6e-5 |
+## Evaluation Metrics
+- **F1-Score** (primary, used for model selection and early stopping)
+- IoU / Jaccard
+- Precision, Recall
+- Overall Accuracy
+## Project Structure
+```
+military-base-change-detection/
+├── configs/config.yaml         # All hyperparameters and paths
+├── data/
+│   ├── download.py             # Dataset download & patch cropping
+│   └── dataset.py              # PyTorch Dataset with synced augmentations
+├── models/
+│   ├── __init__.py             # get_model() factory
+│   ├── siamese_cnn.py          # Siamese CNN baseline
+│   ├── unet_pp.py              # UNet++ change detection
+│   └── changeformer.py         # ChangeFormer transformer
+├── utils/
+│   ├── metrics.py              # F1, IoU, Precision, Recall, OA
+│   ├── losses.py               # BCEDiceLoss, FocalLoss
+│   └── visualization.py        # Plotting utilities
+├── train.py                    # Training with AMP, early stopping, resume
+├── evaluate.py                 # Test set evaluation
+├── inference.py                # Inference on new image pairs
+├── app.py                      # Gradio demo
+├── setup_colab.py              # Colab environment setup
+└── requirements.txt            # Pinned dependencies
+```

app.py ADDED Viewed

	@@ -0,0 +1,193 @@

+"""Gradio web demo for change detection inference.
+Provides an interactive interface to upload before/after satellite image pairs
+and visualize predicted change masks with overlays.
+Usage:
+    python app.py
+"""
+import logging
+from pathlib import Path
+from typing import Optional, Tuple
+import cv2
+import gradio as gr
+import numpy as np
+import torch
+import yaml
+from data.dataset import IMAGENET_MEAN, IMAGENET_STD
+from inference import preprocess_image, sliding_window_inference
+from models import get_model
+from utils.visualization import denormalize, overlay_changes
+logger = logging.getLogger(__name__)
+# Global model cache
+_model: Optional[torch.nn.Module] = None
+_model_name: Optional[str] = None
+_device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+_config = None
+def load_config() -> dict:
+    """Load project config from YAML.
+    Returns:
+        Config dictionary.
+    """
+    config_path = Path("configs/config.yaml")
+    with open(config_path, "r") as f:
+        return yaml.safe_load(f)
+def load_model(model_name: str, checkpoint_path: str) -> torch.nn.Module:
+    """Load a change detection model with caching.
+    Args:
+        model_name: Name of the model architecture.
+        checkpoint_path: Path to the model checkpoint.
+    Returns:
+        Loaded model in eval mode.
+    """
+    global _model, _model_name, _config
+    if _config is None:
+        _config = load_config()
+    if _model is not None and _model_name == model_name:
+        return _model
+    model = get_model(model_name, _config).to(_device)
+    ckpt = torch.load(checkpoint_path, map_location=_device)
+    model.load_state_dict(ckpt["model_state_dict"])
+    model.eval()
+    _model = model
+    _model_name = model_name
+    logger.info("Loaded model: %s from %s", model_name, checkpoint_path)
+    return model
+def predict(
+    before_image: np.ndarray,
+    after_image: np.ndarray,
+    model_name: str,
+    checkpoint_path: str,
+    threshold: float,
+) -> Tuple[np.ndarray, np.ndarray]:
+    """Run change detection on a pair of images.
+    Args:
+        before_image: Before image as numpy array (RGB, uint8).
+        after_image: After image as numpy array (RGB, uint8).
+        model_name: Model architecture name.
+        checkpoint_path: Path to model weights.
+        threshold: Binarization threshold.
+    Returns:
+        Tuple of (binary change mask, overlay visualization).
+    """
+    model = load_model(model_name, checkpoint_path)
+    patch_size = 256
+    # Preprocess both images
+    def _to_tensor(img: np.ndarray) -> torch.Tensor:
+        h, w = img.shape[:2]
+        pad_h = (patch_size - h % patch_size) % patch_size
+        pad_w = (patch_size - w % patch_size) % patch_size
+        if pad_h > 0 or pad_w > 0:
+            img = np.pad(img, ((0, pad_h), (0, pad_w), (0, 0)), mode="reflect")
+        img_f = img.astype(np.float32) / 255.0
+        mean = np.array(IMAGENET_MEAN, dtype=np.float32)
+        std = np.array(IMAGENET_STD, dtype=np.float32)
+        img_f = (img_f - mean) / std
+        return torch.from_numpy(img_f).permute(2, 0, 1).unsqueeze(0).float()
+    orig_h, orig_w = before_image.shape[:2]
+    tensor_a = _to_tensor(before_image)
+    tensor_b = _to_tensor(after_image)
+    # Run inference
+    prob_map = sliding_window_inference(model, tensor_a, tensor_b, patch_size, _device)
+    prob_map = prob_map[:, :, :orig_h, :orig_w]
+    # Binary mask
+    mask_np = prob_map.squeeze().numpy()
+    binary_mask = (mask_np > threshold).astype(np.uint8) * 255
+    # Overlay on after image
+    overlay = after_image.copy().astype(np.float32) / 255.0
+    change_pixels = mask_np > threshold
+    overlay[change_pixels, 0] = np.clip(overlay[change_pixels, 0] * 0.6 + 0.4, 0, 1)
+    overlay[change_pixels, 1] = overlay[change_pixels, 1] * 0.6
+    overlay[change_pixels, 2] = overlay[change_pixels, 2] * 0.6
+    overlay = (overlay * 255).astype(np.uint8)
+    return binary_mask, overlay
+def build_demo() -> gr.Blocks:
+    """Build the Gradio demo interface.
+    Returns:
+        Gradio Blocks application.
+    """
+    config = load_config()
+    gradio_cfg = config.get("gradio", {})
+    with gr.Blocks(title="Military Base Change Detection") as demo:
+        gr.Markdown("# Military Base Change Detection")
+        gr.Markdown("Upload before/after satellite image pairs to detect construction and infrastructure changes.")
+        with gr.Row():
+            with gr.Column():
+                before_img = gr.Image(label="Before Image", type="numpy")
+                after_img = gr.Image(label="After Image", type="numpy")
+            with gr.Column():
+                change_mask = gr.Image(label="Change Mask")
+                overlay_img = gr.Image(label="Overlay")
+        with gr.Row():
+            model_dropdown = gr.Dropdown(
+                choices=["siamese_cnn", "unet_pp", "changeformer"],
+                value=gradio_cfg.get("default_model", "unet_pp"),
+                label="Model",
+            )
+            checkpoint_input = gr.Textbox(
+                value=gradio_cfg.get("default_checkpoint", "checkpoints/unet_pp_best.pth"),
+                label="Checkpoint Path",
+            )
+            threshold_slider = gr.Slider(
+                minimum=0.1, maximum=0.9, value=0.5, step=0.05,
+                label="Detection Threshold",
+            )
+        detect_btn = gr.Button("Detect Changes", variant="primary")
+        detect_btn.click(
+            fn=predict,
+            inputs=[before_img, after_img, model_dropdown, checkpoint_input, threshold_slider],
+            outputs=[change_mask, overlay_img],
+        )
+    return demo
+def main() -> None:
+    """Launch the Gradio demo."""
+    logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
+    config = load_config()
+    gradio_cfg = config.get("gradio", {})
+    demo = build_demo()
+    demo.launch(
+        server_port=gradio_cfg.get("server_port", 7860),
+        share=gradio_cfg.get("share", False),
+    )
+if __name__ == "__main__":
+    main()

configs/config.yaml ADDED Viewed

	@@ -0,0 +1,143 @@

+# =============================================================================
+# Military Base Change Detection — Master Configuration
+# =============================================================================
+# --- Project paths ---
+project:
+  name: "military-base-change-detection"
+  seed: 42
+# --- Colab / runtime settings ---
+colab:
+  enabled: true
+  drive_root: "/content/drive/MyDrive/change-detection"
+  checkpoint_dir: "/content/drive/MyDrive/change-detection/checkpoints"
+  log_dir: "/content/drive/MyDrive/change-detection/logs"
+  output_dir: "/content/drive/MyDrive/change-detection/outputs"
+  data_dir: "/content/drive/MyDrive/change-detection/processed_data"
+# --- Local paths (used when colab.enabled is false) ---
+paths:
+  raw_data: "./raw_data"
+  processed_data: "./processed_data"
+  checkpoint_dir: "./checkpoints"
+  log_dir: "./logs"
+  output_dir: "./outputs"
+# --- Dataset ---
+dataset:
+  name: "levir-cd"                 # levir-cd | whu-cd
+  original_size: 1024
+  patch_size: 256
+  num_workers: 4
+  pin_memory: true
+  # ImageNet normalization
+  mean: [0.485, 0.456, 0.406]
+  std: [0.229, 0.224, 0.225]
+# --- Augmentation (train only) ---
+augmentation:
+  enabled: true
+  horizontal_flip: 0.5
+  vertical_flip: 0.5
+  random_rotate_90: 0.5
+  color_jitter:
+    brightness: 0.2
+    contrast: 0.2
+    saturation: 0.1
+    hue: 0.05
+# --- Model selection ---
+model:
+  name: "unet_pp"                  # siamese_cnn | unet_pp | changeformer
+# --- Model-specific configs ---
+siamese_cnn:
+  backbone: "resnet18"
+  pretrained: true
+unet_pp:
+  encoder_name: "resnet34"
+  pretrained: true
+  deep_supervision: false
+changeformer:
+  embed_dims: [64, 128, 320, 512]  # MiT-B1 style
+  num_heads: [1, 2, 5, 8]
+  mlp_ratios: [8, 8, 4, 4]
+  depths: [2, 2, 2, 2]
+  pretrained_backbone: true
+# --- Training ---
+training:
+  epochs: 100                      # 200 for changeformer
+  optimizer: "adamw"
+  learning_rate: 1.0e-4
+  weight_decay: 0.01
+  scheduler: "cosine"
+  warmup_epochs: 5
+  grad_clip_max_norm: 1.0
+  gradient_accumulation_steps: 1   # set to 2 for changeformer on T4
+  amp: true                        # mixed precision
+  early_stopping:
+    enabled: true
+    patience: 15
+    metric: "f1"
+    mode: "max"
+  log_interval: 10                 # log every N batches
+  vis_interval: 5                  # visualize predictions every N epochs
+# --- Loss ---
+loss:
+  name: "bce_dice"                 # bce_dice | focal
+  bce_dice:
+    bce_weight: 0.5
+    dice_weight: 0.5
+  focal:
+    alpha: 0.25
+    gamma: 2.0
+# --- Evaluation ---
+evaluation:
+  threshold: 0.5
+  metrics:
+    - f1
+    - iou
+    - precision
+    - recall
+    - oa
+# --- GPU-specific batch sizes (auto-detected on Colab) ---
+# model_name -> { gpu_type -> batch_size }
+batch_sizes:
+  siamese_cnn:
+    T4: 16
+    V100: 16
+    default: 8
+  unet_pp:
+    T4: 8
+    V100: 12
+    default: 4
+  changeformer:
+    T4: 4
+    V100: 6
+    default: 2
+# --- Per-model learning rates ---
+learning_rates:
+  siamese_cnn: 1.0e-3
+  unet_pp: 1.0e-4
+  changeformer: 6.0e-5
+# --- Per-model epoch counts ---
+epoch_counts:
+  siamese_cnn: 100
+  unet_pp: 100
+  changeformer: 200
+# --- Gradio demo ---
+gradio:
+  server_port: 7860
+  share: false
+  default_model: "unet_pp"
+  default_checkpoint: "checkpoints/unet_pp_best.pth"

data/dataset.py ADDED Viewed

	@@ -0,0 +1,153 @@

+"""PyTorch Dataset for change detection tasks.
+Loads pre-cropped 256x256 image patches (before/after) and binary change masks.
+Supports synchronized augmentations via albumentations.ReplayCompose.
+"""
+import logging
+from pathlib import Path
+from typing import Any, Dict, Optional, Tuple
+import albumentations as A
+import cv2
+import numpy as np
+import torch
+from torch.utils.data import Dataset
+logger = logging.getLogger(__name__)
+# ImageNet normalization constants
+IMAGENET_MEAN = (0.485, 0.456, 0.406)
+IMAGENET_STD = (0.229, 0.224, 0.225)
+def get_train_transforms(config: Dict[str, Any]) -> A.ReplayCompose:
+    """Build training augmentation pipeline with synchronized transforms.
+    Args:
+        config: Augmentation config dict from config.yaml.
+    Returns:
+        ReplayCompose that applies identical spatial transforms to A, B, and mask.
+    """
+    aug_cfg = config.get("augmentation", {})
+    transforms = []
+    if aug_cfg.get("horizontal_flip", 0) > 0:
+        transforms.append(A.HorizontalFlip(p=aug_cfg["horizontal_flip"]))
+    if aug_cfg.get("vertical_flip", 0) > 0:
+        transforms.append(A.VerticalFlip(p=aug_cfg["vertical_flip"]))
+    if aug_cfg.get("random_rotate_90", 0) > 0:
+        transforms.append(A.RandomRotate90(p=aug_cfg["random_rotate_90"]))
+    transforms.append(A.Normalize(mean=IMAGENET_MEAN, std=IMAGENET_STD))
+    return A.ReplayCompose(
+        transforms,
+        additional_targets={"image_b": "image", "mask": "mask"},
+    )
+def get_val_transforms() -> A.Compose:
+    """Build validation/test transform pipeline (normalize only).
+    Returns:
+        Compose with ImageNet normalization only.
+    """
+    return A.Compose(
+        [A.Normalize(mean=IMAGENET_MEAN, std=IMAGENET_STD)],
+        additional_targets={"image_b": "image"},
+    )
+class ChangeDetectionDataset(Dataset):
+    """Dataset for loading change detection image pairs and masks.
+    Expects directory structure:
+        root/
+        ├── A/        # before images
+        ├── B/        # after images
+        └── label/    # binary change masks (0=no change, 255=change)
+    Args:
+        root: Path to the split directory (e.g., processed_data/train).
+        split: One of 'train', 'val', 'test'.
+        config: Full config dict for augmentation settings.
+        transform: Optional override for the transform pipeline.
+    """
+    def __init__(
+        self,
+        root: Path,
+        split: str = "train",
+        config: Optional[Dict[str, Any]] = None,
+        transform: Optional[Any] = None,
+    ) -> None:
+        self.root = Path(root)
+        self.split = split
+        self.dir_a = self.root / "A"
+        self.dir_b = self.root / "B"
+        self.dir_label = self.root / "label"
+        # Collect sorted file lists
+        self.filenames = sorted([f.name for f in self.dir_a.iterdir() if f.suffix in (".png", ".jpg", ".tif")])
+        logger.info("Loaded %d samples for split '%s' from %s", len(self.filenames), split, root)
+        # Set up transforms
+        if transform is not None:
+            self.transform = transform
+        elif split == "train" and config is not None:
+            self.transform = get_train_transforms(config)
+        else:
+            self.transform = get_val_transforms()
+    def __len__(self) -> int:
+        return len(self.filenames)
+    def __getitem__(self, idx: int) -> Dict[str, torch.Tensor]:
+        """Load a single sample.
+        Args:
+            idx: Sample index.
+        Returns:
+            Dict with keys 'A', 'B', 'mask', 'filename'.
+              - A: before image tensor [3, H, W]
+              - B: after image tensor [3, H, W]
+              - mask: binary change mask tensor [1, H, W] (float, 0 or 1)
+              - filename: original filename string
+        """
+        fname = self.filenames[idx]
+        # Lazy load — read from disk each time (no RAM caching)
+        img_a = cv2.imread(str(self.dir_a / fname), cv2.IMREAD_COLOR)
+        img_a = cv2.cvtColor(img_a, cv2.COLOR_BGR2RGB)
+        img_b = cv2.imread(str(self.dir_b / fname), cv2.IMREAD_COLOR)
+        img_b = cv2.cvtColor(img_b, cv2.COLOR_BGR2RGB)
+        mask = cv2.imread(str(self.dir_label / fname), cv2.IMREAD_GRAYSCALE)
+        # Normalize 0/255 -> 0/1
+        mask = (mask / 255.0).astype(np.float32)
+        # Apply synchronized augmentations
+        if isinstance(self.transform, A.ReplayCompose):
+            transformed = self.transform(image=img_a, image_b=img_b, mask=mask)
+            img_a = transformed["image"]
+            img_b = transformed["image_b"]
+            mask = transformed["mask"]
+        else:
+            transformed = self.transform(image=img_a, image_b=img_b)
+            img_a = transformed["image"]
+            img_b = transformed["image_b"]
+            # Normalize only applied to images, mask stays as-is
+        # HWC -> CHW for images, add channel dim for mask
+        img_a = torch.from_numpy(img_a).permute(2, 0, 1).float()
+        img_b = torch.from_numpy(img_b).permute(2, 0, 1).float()
+        mask = torch.from_numpy(mask).unsqueeze(0).float()
+        return {"A": img_a, "B": img_b, "mask": mask, "filename": fname}

data/download.py ADDED Viewed

	@@ -0,0 +1,132 @@

+"""Download and preprocess change detection datasets.
+Supports LEVIR-CD and WHU-CD datasets. Downloads raw data, crops 1024x1024
+images into 256x256 non-overlapping patches, and organizes into train/val/test
+splits.
+Usage:
+    python data/download.py --dataset levir-cd --raw_dir ./raw_data --out_dir ./processed_data
+"""
+import argparse
+import logging
+from pathlib import Path
+from typing import Tuple
+import cv2
+import numpy as np
+logger = logging.getLogger(__name__)
+def download_levir_cd(raw_dir: Path) -> None:
+    """Download the LEVIR-CD dataset.
+    Args:
+        raw_dir: Directory to save the raw downloaded files.
+    """
+    # TODO: Implement download via gdown or direct URL
+    raise NotImplementedError("LEVIR-CD download not yet implemented")
+def download_whu_cd(raw_dir: Path) -> None:
+    """Download the WHU-CD dataset.
+    Args:
+        raw_dir: Directory to save the raw downloaded files.
+    """
+    # TODO: Implement download
+    raise NotImplementedError("WHU-CD download not yet implemented")
+def crop_to_patches(
+    image: np.ndarray,
+    patch_size: int = 256,
+) -> list[np.ndarray]:
+    """Crop an image into non-overlapping patches.
+    Args:
+        image: Input image of shape (H, W) or (H, W, C).
+        patch_size: Size of each square patch.
+    Returns:
+        List of cropped patches.
+    """
+    h, w = image.shape[:2]
+    patches = []
+    for y in range(0, h - patch_size + 1, patch_size):
+        for x in range(0, w - patch_size + 1, patch_size):
+            patch = image[y : y + patch_size, x : x + patch_size]
+            patches.append(patch)
+    return patches
+def process_split(
+    raw_dir: Path,
+    out_dir: Path,
+    split: str,
+    patch_size: int = 256,
+) -> int:
+    """Process a single dataset split (train/val/test).
+    Reads image pairs and masks from raw_dir, crops into patches, and
+    saves to out_dir.
+    Args:
+        raw_dir: Root directory of the raw dataset.
+        out_dir: Output directory for processed patches.
+        split: One of 'train', 'val', 'test'.
+        patch_size: Size of each square patch.
+    Returns:
+        Number of patch triplets generated.
+    """
+    # TODO: Implement processing pipeline
+    raise NotImplementedError("Split processing not yet implemented")
+def preprocess_dataset(
+    dataset: str,
+    raw_dir: Path,
+    out_dir: Path,
+    patch_size: int = 256,
+) -> None:
+    """Run full preprocessing pipeline for a dataset.
+    Args:
+        dataset: Dataset name ('levir-cd' or 'whu-cd').
+        raw_dir: Directory containing raw downloaded data.
+        out_dir: Output directory for processed patches.
+        patch_size: Size of each square patch.
+    """
+    logger.info("Preprocessing %s: %s -> %s", dataset, raw_dir, out_dir)
+    out_dir.mkdir(parents=True, exist_ok=True)
+    for split in ["train", "val", "test"]:
+        count = process_split(raw_dir, out_dir, split, patch_size)
+        logger.info("  %s: %d patch triplets", split, count)
+def main() -> None:
+    """CLI entry point for dataset download and preprocessing."""
+    parser = argparse.ArgumentParser(description="Download and preprocess change detection datasets")
+    parser.add_argument("--dataset", type=str, default="levir-cd", choices=["levir-cd", "whu-cd"])
+    parser.add_argument("--raw_dir", type=Path, default=Path("./raw_data"))
+    parser.add_argument("--out_dir", type=Path, default=Path("./processed_data"))
+    parser.add_argument("--patch_size", type=int, default=256)
+    parser.add_argument("--skip_download", action="store_true", help="Skip download, only preprocess")
+    args = parser.parse_args()
+    logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
+    if not args.skip_download:
+        if args.dataset == "levir-cd":
+            download_levir_cd(args.raw_dir)
+        elif args.dataset == "whu-cd":
+            download_whu_cd(args.raw_dir)
+    preprocess_dataset(args.dataset, args.raw_dir, args.out_dir, args.patch_size)
+if __name__ == "__main__":
+    main()

evaluate.py ADDED Viewed

	@@ -0,0 +1,135 @@

+"""Evaluation script for change detection models.
+Runs a trained model on the test set, computes all metrics, and generates
+visualization outputs.
+Usage:
+    python evaluate.py --config configs/config.yaml --checkpoint checkpoints/unet_pp_best.pth
+"""
+import argparse
+import logging
+from pathlib import Path
+from typing import Any, Dict
+import torch
+import torch.nn as nn
+from torch.utils.data import DataLoader
+from tqdm import tqdm
+import yaml
+from data.dataset import ChangeDetectionDataset
+from models import get_model
+from utils.metrics import ConfusionMatrix
+from utils.visualization import plot_prediction
+logger = logging.getLogger(__name__)
+def evaluate(
+    model: nn.Module,
+    loader: DataLoader,
+    device: torch.device,
+    threshold: float = 0.5,
+    output_dir: Path = Path("./outputs"),
+    max_vis: int = 20,
+) -> Dict[str, float]:
+    """Evaluate model on the full test set.
+    Args:
+        model: Trained change detection model.
+        loader: Test DataLoader.
+        device: Target device.
+        threshold: Binarization threshold for predictions.
+        output_dir: Directory to save visualization outputs.
+        max_vis: Maximum number of sample predictions to save.
+    Returns:
+        Dict of metric name -> value.
+    """
+    model.eval()
+    cm = ConfusionMatrix()
+    vis_dir = output_dir / "visualizations"
+    vis_dir.mkdir(parents=True, exist_ok=True)
+    vis_count = 0
+    with torch.no_grad():
+        for batch in tqdm(loader, desc="Evaluating"):
+            img_a = batch["A"].to(device)
+            img_b = batch["B"].to(device)
+            mask = batch["mask"].to(device)
+            logits = model(img_a, img_b)
+            preds = (torch.sigmoid(logits) > threshold).float()
+            cm.update(preds, mask)
+            # Save sample visualizations
+            if vis_count < max_vis:
+                for i in range(min(img_a.size(0), max_vis - vis_count)):
+                    plot_prediction(
+                        img_a[i], img_b[i], mask[i], preds[i],
+                        save_path=vis_dir / f"sample_{vis_count:04d}.png",
+                    )
+                    vis_count += 1
+    metrics = cm.compute()
+    return metrics
+def main() -> None:
+    """Main evaluation entry point."""
+    parser = argparse.ArgumentParser(description="Evaluate change detection model")
+    parser.add_argument("--config", type=Path, default=Path("configs/config.yaml"))
+    parser.add_argument("--checkpoint", type=Path, required=True)
+    parser.add_argument("--model", type=str, default=None, help="Override model name")
+    parser.add_argument("--threshold", type=float, default=None)
+    args = parser.parse_args()
+    logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
+    with open(args.config, "r") as f:
+        config = yaml.safe_load(f)
+    model_name = args.model or config["model"]["name"]
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    threshold = args.threshold or config.get("evaluation", {}).get("threshold", 0.5)
+    # Resolve paths
+    colab = config.get("colab", {})
+    if colab.get("enabled", False):
+        data_dir = Path(colab["data_dir"])
+        output_dir = Path(colab["output_dir"])
+    else:
+        data_dir = Path(config["paths"]["processed_data"])
+        output_dir = Path(config["paths"]["output_dir"])
+    # Model
+    model = get_model(model_name, config).to(device)
+    ckpt = torch.load(args.checkpoint, map_location=device)
+    model.load_state_dict(ckpt["model_state_dict"])
+    logger.info("Loaded checkpoint: %s (epoch %d, F1 %.4f)",
+                args.checkpoint, ckpt.get("epoch", -1), ckpt.get("best_f1", -1))
+    # Test data
+    ds_cfg = config.get("dataset", {})
+    test_ds = ChangeDetectionDataset(data_dir / "test", split="test", config=config)
+    test_loader = DataLoader(
+        test_ds, batch_size=8, shuffle=False,
+        num_workers=ds_cfg.get("num_workers", 4),
+        pin_memory=ds_cfg.get("pin_memory", True),
+    )
+    # Evaluate
+    metrics = evaluate(model, test_loader, device, threshold, output_dir)
+    # Print results
+    logger.info("=" * 50)
+    logger.info("TEST SET RESULTS — %s", model_name)
+    logger.info("=" * 50)
+    for name, value in metrics.items():
+        logger.info("  %-12s: %.4f", name.upper(), value)
+    logger.info("=" * 50)
+if __name__ == "__main__":
+    main()

inference.py ADDED Viewed

	@@ -0,0 +1,176 @@

+"""Run inference on arbitrary before/after image pairs.
+Loads a trained change detection model and produces binary change masks
+for new satellite image pairs.
+Usage:
+    python inference.py --before path/to/before.png --after path/to/after.png \
+        --model changeformer --checkpoint checkpoints/changeformer_best.pth
+"""
+import argparse
+import logging
+from pathlib import Path
+from typing import Any, Dict, Tuple
+import cv2
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import yaml
+from data.dataset import IMAGENET_MEAN, IMAGENET_STD
+from models import get_model
+from utils.visualization import overlay_changes, plot_prediction
+logger = logging.getLogger(__name__)
+def preprocess_image(
+    image_path: Path,
+    patch_size: int = 256,
+) -> Tuple[torch.Tensor, Tuple[int, int]]:
+    """Load and preprocess a single image for inference.
+    Reads the image, pads to a multiple of patch_size, and applies
+    ImageNet normalization.
+    Args:
+        image_path: Path to the input image.
+        patch_size: Patch size the model expects.
+    Returns:
+        Tuple of (preprocessed tensor [1, 3, H, W], original (H, W)).
+    """
+    img = cv2.imread(str(image_path), cv2.IMREAD_COLOR)
+    if img is None:
+        raise FileNotFoundError(f"Could not read image: {image_path}")
+    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+    orig_h, orig_w = img.shape[:2]
+    # Pad to multiple of patch_size
+    pad_h = (patch_size - orig_h % patch_size) % patch_size
+    pad_w = (patch_size - orig_w % patch_size) % patch_size
+    if pad_h > 0 or pad_w > 0:
+        img = np.pad(img, ((0, pad_h), (0, pad_w), (0, 0)), mode="reflect")
+    # Normalize
+    img = img.astype(np.float32) / 255.0
+    mean = np.array(IMAGENET_MEAN, dtype=np.float32)
+    std = np.array(IMAGENET_STD, dtype=np.float32)
+    img = (img - mean) / std
+    # HWC -> CHW, add batch dim
+    tensor = torch.from_numpy(img).permute(2, 0, 1).unsqueeze(0).float()
+    return tensor, (orig_h, orig_w)
+def sliding_window_inference(
+    model: nn.Module,
+    img_a: torch.Tensor,
+    img_b: torch.Tensor,
+    patch_size: int = 256,
+    device: torch.device = torch.device("cpu"),
+) -> torch.Tensor:
+    """Run inference using sliding window for large images.
+    Splits images into non-overlapping patches, runs model on each,
+    and stitches results back together.
+    Args:
+        model: Trained change detection model.
+        img_a: Before image tensor [1, 3, H, W].
+        img_b: After image tensor [1, 3, H, W].
+        patch_size: Size of each patch.
+        device: Inference device.
+    Returns:
+        Probability map [1, 1, H, W] (after sigmoid).
+    """
+    _, _, h, w = img_a.shape
+    output = torch.zeros(1, 1, h, w, device="cpu")
+    model.eval()
+    with torch.no_grad():
+        for y in range(0, h, patch_size):
+            for x in range(0, w, patch_size):
+                patch_a = img_a[:, :, y:y + patch_size, x:x + patch_size].to(device)
+                patch_b = img_b[:, :, y:y + patch_size, x:x + patch_size].to(device)
+                logits = model(patch_a, patch_b)
+                probs = torch.sigmoid(logits).cpu()
+                output[:, :, y:y + patch_size, x:x + patch_size] = probs
+    return output
+def save_change_mask(
+    mask: np.ndarray,
+    save_path: Path,
+    threshold: float = 0.5,
+) -> None:
+    """Save binary change mask as an image.
+    Args:
+        mask: Probability map [H, W] with values in [0, 1].
+        save_path: Output file path.
+        threshold: Binarization threshold.
+    """
+    binary = (mask > threshold).astype(np.uint8) * 255
+    save_path.parent.mkdir(parents=True, exist_ok=True)
+    cv2.imwrite(str(save_path), binary)
+    logger.info("Saved change mask: %s", save_path)
+def main() -> None:
+    """Main inference entry point."""
+    parser = argparse.ArgumentParser(description="Run change detection inference")
+    parser.add_argument("--before", type=Path, required=True, help="Path to before image")
+    parser.add_argument("--after", type=Path, required=True, help="Path to after image")
+    parser.add_argument("--model", type=str, default=None, help="Model name")
+    parser.add_argument("--checkpoint", type=Path, required=True, help="Path to model checkpoint")
+    parser.add_argument("--config", type=Path, default=Path("configs/config.yaml"))
+    parser.add_argument("--output", type=Path, default=Path("outputs/inference"))
+    parser.add_argument("--threshold", type=float, default=0.5)
+    args = parser.parse_args()
+    logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
+    # Load config
+    with open(args.config, "r") as f:
+        config = yaml.safe_load(f)
+    model_name = args.model or config["model"]["name"]
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    patch_size = config.get("dataset", {}).get("patch_size", 256)
+    # Load model
+    model = get_model(model_name, config).to(device)
+    ckpt = torch.load(args.checkpoint, map_location=device)
+    model.load_state_dict(ckpt["model_state_dict"])
+    logger.info("Loaded model '%s' from %s", model_name, args.checkpoint)
+    # Preprocess images
+    img_a, (orig_h, orig_w) = preprocess_image(args.before, patch_size)
+    img_b, _ = preprocess_image(args.after, patch_size)
+    # Run inference
+    prob_map = sliding_window_inference(model, img_a, img_b, patch_size, device)
+    # Crop back to original size and save
+    prob_map = prob_map[:, :, :orig_h, :orig_w]
+    mask_np = prob_map.squeeze().numpy()
+    args.output.mkdir(parents=True, exist_ok=True)
+    save_change_mask(mask_np, args.output / "change_mask.png", args.threshold)
+    # Save overlay visualization
+    overlay = overlay_changes(img_b.squeeze()[:, :orig_h, :orig_w], prob_map.squeeze(0))
+    overlay_uint8 = (overlay * 255).astype(np.uint8)
+    cv2.imwrite(str(args.output / "overlay.png"), cv2.cvtColor(overlay_uint8, cv2.COLOR_RGB2BGR))
+    logger.info("Saved overlay: %s", args.output / "overlay.png")
+if __name__ == "__main__":
+    main()

models/__init__.py ADDED Viewed

	@@ -0,0 +1,39 @@

+"""Model factory for change detection models.
+Provides a unified interface to instantiate any supported model by name.
+"""
+from typing import Any, Dict
+import torch.nn as nn
+from .changeformer import ChangeFormer
+from .siamese_cnn import SiameseCNN
+from .unet_pp import UNetPPChangeDetection
+_MODEL_REGISTRY: Dict[str, type] = {
+    "siamese_cnn": SiameseCNN,
+    "unet_pp": UNetPPChangeDetection,
+    "changeformer": ChangeFormer,
+}
+def get_model(model_name: str, config: Dict[str, Any]) -> nn.Module:
+    """Instantiate a change detection model by name.
+    Args:
+        model_name: One of 'siamese_cnn', 'unet_pp', 'changeformer'.
+        config: Full config dict; model-specific section is extracted internally.
+    Returns:
+        Initialized model (nn.Module).
+    Raises:
+        ValueError: If model_name is not recognized.
+    """
+    if model_name not in _MODEL_REGISTRY:
+        raise ValueError(f"Unknown model '{model_name}'. Choose from: {list(_MODEL_REGISTRY.keys())}")
+    model_cls = _MODEL_REGISTRY[model_name]
+    model_config = config.get(model_name, {})
+    return model_cls(**model_config)

models/changeformer.py ADDED Viewed

	@@ -0,0 +1,358 @@

+"""ChangeFormer — Transformer-based change detection model.
+Implements a hierarchical vision transformer (MiT-B1 style) with shared-weight
+Siamese encoder and MLP decoder for change detection. Based on:
+"A Transformer-Based Siamese Network for Change Detection" (arXiv:2201.01293).
+"""
+from typing import List, Tuple
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from einops import rearrange
+class OverlapPatchEmbed(nn.Module):
+    """Overlapping patch embedding for hierarchical feature extraction.
+    Args:
+        in_channels: Number of input channels.
+        embed_dim: Embedding dimension.
+        patch_size: Patch size for convolution.
+        stride: Stride for convolution.
+    """
+    def __init__(
+        self,
+        in_channels: int = 3,
+        embed_dim: int = 64,
+        patch_size: int = 7,
+        stride: int = 4,
+    ) -> None:
+        super().__init__()
+        self.proj = nn.Conv2d(
+            in_channels, embed_dim,
+            kernel_size=patch_size, stride=stride,
+            padding=patch_size // 2,
+        )
+        self.norm = nn.LayerNorm(embed_dim)
+    def forward(self, x: torch.Tensor) -> Tuple[torch.Tensor, int, int]:
+        """Forward pass.
+        Args:
+            x: Input tensor [B, C, H, W].
+        Returns:
+            Tuple of (tokens [B, N, D], height, width).
+        """
+        x = self.proj(x)
+        _, _, h, w = x.shape
+        x = rearrange(x, "b c h w -> b (h w) c")
+        x = self.norm(x)
+        return x, h, w
+class EfficientSelfAttention(nn.Module):
+    """Efficient self-attention with spatial reduction.
+    Args:
+        dim: Input dimension.
+        num_heads: Number of attention heads.
+        sr_ratio: Spatial reduction ratio.
+    """
+    def __init__(self, dim: int, num_heads: int = 1, sr_ratio: int = 8) -> None:
+        super().__init__()
+        self.num_heads = num_heads
+        self.head_dim = dim // num_heads
+        self.scale = self.head_dim ** -0.5
+        self.q = nn.Linear(dim, dim)
+        self.kv = nn.Linear(dim, dim * 2)
+        self.proj = nn.Linear(dim, dim)
+        # Spatial reduction
+        self.sr_ratio = sr_ratio
+        if sr_ratio > 1:
+            self.sr = nn.Conv2d(dim, dim, kernel_size=sr_ratio, stride=sr_ratio)
+            self.sr_norm = nn.LayerNorm(dim)
+    def forward(self, x: torch.Tensor, h: int, w: int) -> torch.Tensor:
+        """Forward pass.
+        Args:
+            x: Input tokens [B, N, C].
+            h: Feature map height.
+            w: Feature map width.
+        Returns:
+            Output tokens [B, N, C].
+        """
+        b, n, c = x.shape
+        q = self.q(x).reshape(b, n, self.num_heads, self.head_dim).permute(0, 2, 1, 3)
+        if self.sr_ratio > 1:
+            x_ = rearrange(x, "b (h w) c -> b c h w", h=h, w=w)
+            x_ = self.sr(x_)
+            x_ = rearrange(x_, "b c h w -> b (h w) c")
+            x_ = self.sr_norm(x_)
+        else:
+            x_ = x
+        kv = self.kv(x_).reshape(b, -1, 2, self.num_heads, self.head_dim).permute(2, 0, 3, 1, 4)
+        k, v = kv[0], kv[1]
+        attn = (q @ k.transpose(-2, -1)) * self.scale
+        attn = attn.softmax(dim=-1)
+        out = (attn @ v).transpose(1, 2).reshape(b, n, c)
+        out = self.proj(out)
+        return out
+class MixFFN(nn.Module):
+    """Mix Feed-Forward Network with depthwise convolution.
+    Args:
+        dim: Input/output dimension.
+        mlp_ratio: Expansion ratio for hidden dimension.
+    """
+    def __init__(self, dim: int, mlp_ratio: int = 4) -> None:
+        super().__init__()
+        hidden = dim * mlp_ratio
+        self.fc1 = nn.Linear(dim, hidden)
+        self.dwconv = nn.Conv2d(hidden, hidden, 3, 1, 1, groups=hidden)
+        self.fc2 = nn.Linear(hidden, dim)
+        self.act = nn.GELU()
+    def forward(self, x: torch.Tensor, h: int, w: int) -> torch.Tensor:
+        """Forward pass.
+        Args:
+            x: Input tokens [B, N, C].
+            h: Feature map height.
+            w: Feature map width.
+        Returns:
+            Output tokens [B, N, C].
+        """
+        x = self.fc1(x)
+        x = rearrange(x, "b (h w) c -> b c h w", h=h, w=w)
+        x = self.act(self.dwconv(x))
+        x = rearrange(x, "b c h w -> b (h w) c")
+        x = self.fc2(x)
+        return x
+class TransformerBlock(nn.Module):
+    """Single transformer block with efficient attention and MixFFN.
+    Args:
+        dim: Feature dimension.
+        num_heads: Number of attention heads.
+        mlp_ratio: MLP expansion ratio.
+        sr_ratio: Spatial reduction ratio for attention.
+    """
+    def __init__(
+        self,
+        dim: int,
+        num_heads: int = 1,
+        mlp_ratio: int = 4,
+        sr_ratio: int = 8,
+    ) -> None:
+        super().__init__()
+        self.norm1 = nn.LayerNorm(dim)
+        self.attn = EfficientSelfAttention(dim, num_heads, sr_ratio)
+        self.norm2 = nn.LayerNorm(dim)
+        self.ffn = MixFFN(dim, mlp_ratio)
+    def forward(self, x: torch.Tensor, h: int, w: int) -> torch.Tensor:
+        """Forward pass.
+        Args:
+            x: Input tokens [B, N, C].
+            h: Feature map height.
+            w: Feature map width.
+        Returns:
+            Output tokens [B, N, C].
+        """
+        x = x + self.attn(self.norm1(x), h, w)
+        x = x + self.ffn(self.norm2(x), h, w)
+        return x
+class MiTEncoder(nn.Module):
+    """Mix Transformer (MiT) encoder — hierarchical vision transformer.
+    Args:
+        embed_dims: Embedding dimensions at each stage.
+        num_heads: Number of attention heads at each stage.
+        mlp_ratios: MLP expansion ratios at each stage.
+        depths: Number of transformer blocks at each stage.
+    """
+    def __init__(
+        self,
+        embed_dims: List[int] = [64, 128, 320, 512],
+        num_heads: List[int] = [1, 2, 5, 8],
+        mlp_ratios: List[int] = [8, 8, 4, 4],
+        depths: List[int] = [2, 2, 2, 2],
+    ) -> None:
+        super().__init__()
+        self.num_stages = len(embed_dims)
+        sr_ratios = [8, 4, 2, 1]
+        patch_sizes = [7, 3, 3, 3]
+        strides = [4, 2, 2, 2]
+        self.patch_embeds = nn.ModuleList()
+        self.blocks = nn.ModuleList()
+        self.norms = nn.ModuleList()
+        for i in range(self.num_stages):
+            in_ch = 3 if i == 0 else embed_dims[i - 1]
+            self.patch_embeds.append(
+                OverlapPatchEmbed(in_ch, embed_dims[i], patch_sizes[i], strides[i])
+            )
+            self.blocks.append(
+                nn.ModuleList([
+                    TransformerBlock(embed_dims[i], num_heads[i], mlp_ratios[i], sr_ratios[i])
+                    for _ in range(depths[i])
+                ])
+            )
+            self.norms.append(nn.LayerNorm(embed_dims[i]))
+    def forward(self, x: torch.Tensor) -> List[torch.Tensor]:
+        """Extract hierarchical features.
+        Args:
+            x: Input image [B, 3, H, W].
+        Returns:
+            List of feature maps at each stage [B, C_i, H_i, W_i].
+        """
+        features = []
+        for i in range(self.num_stages):
+            x, h, w = self.patch_embeds[i](x)
+            for blk in self.blocks[i]:
+                x = blk(x, h, w)
+            x = self.norms[i](x)
+            x = rearrange(x, "b (h w) c -> b c h w", h=h, w=w)
+            features.append(x)
+        return features
+class MLPDecoder(nn.Module):
+    """MLP-based decoder that fuses multi-scale difference features.
+    Args:
+        embed_dims: Embedding dimensions from each encoder stage.
+        out_channels: Number of output channels (1 for binary change mask).
+    """
+    def __init__(
+        self,
+        embed_dims: List[int] = [64, 128, 320, 512],
+        out_channels: int = 1,
+    ) -> None:
+        super().__init__()
+        unified_dim = embed_dims[0]
+        self.linear_projections = nn.ModuleList([
+            nn.Conv2d(dim, unified_dim, kernel_size=1)
+            for dim in embed_dims
+        ])
+        self.fuse = nn.Sequential(
+            nn.Conv2d(unified_dim * len(embed_dims), unified_dim, kernel_size=1),
+            nn.BatchNorm2d(unified_dim),
+            nn.ReLU(inplace=True),
+        )
+        self.head = nn.Conv2d(unified_dim, out_channels, kernel_size=1)
+    def forward(self, features: List[torch.Tensor], target_size: Tuple[int, int]) -> torch.Tensor:
+        """Forward pass.
+        Args:
+            features: List of difference feature maps.
+            target_size: (H, W) of the desired output.
+        Returns:
+            Logits [B, 1, H, W].
+        """
+        projected = []
+        for i, (feat, proj) in enumerate(zip(features, self.linear_projections)):
+            p = proj(feat)
+            p = F.interpolate(p, size=target_size, mode="bilinear", align_corners=False)
+            projected.append(p)
+        fused = self.fuse(torch.cat(projected, dim=1))
+        out = self.head(fused)
+        return out
+class ChangeFormer(nn.Module):
+    """ChangeFormer: Transformer-based Siamese network for change detection.
+    Args:
+        embed_dims: Embedding dims at each hierarchical stage.
+        num_heads: Attention heads at each stage.
+        mlp_ratios: MLP expansion ratios at each stage.
+        depths: Transformer block counts at each stage.
+        pretrained_backbone: Whether to load pretrained MiT weights.
+    """
+    def __init__(
+        self,
+        embed_dims: List[int] = [64, 128, 320, 512],
+        num_heads: List[int] = [1, 2, 5, 8],
+        mlp_ratios: List[int] = [8, 8, 4, 4],
+        depths: List[int] = [2, 2, 2, 2],
+        pretrained_backbone: bool = True,
+    ) -> None:
+        super().__init__()
+        # Shared Siamese encoder
+        self.encoder = MiTEncoder(embed_dims, num_heads, mlp_ratios, depths)
+        # MLP decoder
+        self.decoder = MLPDecoder(embed_dims, out_channels=1)
+        # TODO: Load pretrained MiT-B1 weights if pretrained_backbone is True
+    def forward(self, x1: torch.Tensor, x2: torch.Tensor) -> torch.Tensor:
+        """Forward pass.
+        Args:
+            x1: Before image [B, 3, 256, 256].
+            x2: After image [B, 3, 256, 256].
+        Returns:
+            Raw logits [B, 1, 256, 256].
+        """
+        # Extract hierarchical features
+        feats_1 = self.encoder(x1)
+        feats_2 = self.encoder(x2)
+        # Compute difference at each scale
+        diff_feats = [torch.abs(f1 - f2) for f1, f2 in zip(feats_1, feats_2)]
+        # Decode to change mask
+        target_size = (x1.shape[2], x1.shape[3])
+        out = self.decoder(diff_feats, target_size)
+        return out
+if __name__ == "__main__":
+    # Quick sanity check
+    model = ChangeFormer(pretrained_backbone=False)
+    x1 = torch.randn(1, 3, 256, 256)
+    x2 = torch.randn(1, 3, 256, 256)
+    out = model(x1, x2)
+    print(f"Input: {x1.shape}, Output: {out.shape}")
+    assert out.shape == (1, 1, 256, 256), f"Unexpected shape: {out.shape}"
+    print(f"Parameters: {sum(p.numel() for p in model.parameters()) / 1e6:.1f}M")

models/siamese_cnn.py ADDED Viewed

	@@ -0,0 +1,85 @@

+"""Siamese CNN baseline for change detection.
+Architecture: Shared-weight ResNet18 backbone extracts features from both
+images. Feature difference is decoded via transposed convolutions to produce
+a binary change mask.
+"""
+import torch
+import torch.nn as nn
+import torchvision.models as models
+class SiameseCNN(nn.Module):
+    """Siamese CNN with shared ResNet18 encoder and transposed-conv decoder.
+    Args:
+        backbone: Backbone architecture name (default: 'resnet18').
+        pretrained: Whether to use ImageNet-pretrained weights.
+    """
+    def __init__(self, backbone: str = "resnet18", pretrained: bool = True) -> None:
+        super().__init__()
+        # Shared encoder
+        resnet = getattr(models, backbone)(
+            weights=models.ResNet18_Weights.DEFAULT if pretrained else None
+        )
+        # Remove avgpool and fc — keep feature extraction layers
+        self.encoder = nn.Sequential(
+            resnet.conv1,
+            resnet.bn1,
+            resnet.relu,
+            resnet.maxpool,
+            resnet.layer1,  # 64 channels
+            resnet.layer2,  # 128 channels
+            resnet.layer3,  # 256 channels
+            resnet.layer4,  # 512 channels
+        )
+        # Decoder: upsample difference features back to input resolution
+        self.decoder = nn.Sequential(
+            nn.ConvTranspose2d(512, 256, kernel_size=4, stride=2, padding=1),
+            nn.BatchNorm2d(256),
+            nn.ReLU(inplace=True),
+            nn.ConvTranspose2d(256, 128, kernel_size=4, stride=2, padding=1),
+            nn.BatchNorm2d(128),
+            nn.ReLU(inplace=True),
+            nn.ConvTranspose2d(128, 64, kernel_size=4, stride=2, padding=1),
+            nn.BatchNorm2d(64),
+            nn.ReLU(inplace=True),
+            nn.ConvTranspose2d(64, 32, kernel_size=4, stride=2, padding=1),
+            nn.BatchNorm2d(32),
+            nn.ReLU(inplace=True),
+            nn.ConvTranspose2d(32, 1, kernel_size=4, stride=2, padding=1),
+        )
+    def forward(self, x1: torch.Tensor, x2: torch.Tensor) -> torch.Tensor:
+        """Forward pass.
+        Args:
+            x1: Before image [B, 3, 256, 256].
+            x2: After image [B, 3, 256, 256].
+        Returns:
+            Raw logits [B, 1, 256, 256].
+        """
+        f1 = self.encoder(x1)
+        f2 = self.encoder(x2)
+        # Feature difference
+        diff = torch.abs(f1 - f2)
+        # Decode to change mask
+        out = self.decoder(diff)
+        return out
+if __name__ == "__main__":
+    # Quick sanity check
+    model = SiameseCNN(pretrained=False)
+    x1 = torch.randn(2, 3, 256, 256)
+    x2 = torch.randn(2, 3, 256, 256)
+    out = model(x1, x2)
+    print(f"Input: {x1.shape}, Output: {out.shape}")
+    assert out.shape == (2, 1, 256, 256), f"Unexpected shape: {out.shape}"

models/unet_pp.py ADDED Viewed

	@@ -0,0 +1,78 @@

+"""UNet++ (Nested U-Net) for change detection.
+Uses a shared ResNet34 encoder from segmentation-models-pytorch. Features from
+both temporal images are differenced and decoded through nested skip connections.
+Optionally supports deep supervision.
+"""
+import torch
+import torch.nn as nn
+import segmentation_models_pytorch as smp
+class UNetPPChangeDetection(nn.Module):
+    """UNet++ adapted for bitemporal change detection.
+    A shared encoder processes both images. The absolute difference of
+    encoder features is fed into the UNet++ decoder.
+    Args:
+        encoder_name: Encoder backbone (default: 'resnet34').
+        pretrained: Use ImageNet-pretrained encoder weights.
+        deep_supervision: Enable deep supervision outputs.
+    """
+    def __init__(
+        self,
+        encoder_name: str = "resnet34",
+        pretrained: bool = True,
+        deep_supervision: bool = False,
+    ) -> None:
+        super().__init__()
+        self.deep_supervision = deep_supervision
+        # Shared encoder via SMP
+        encoder_weights = "imagenet" if pretrained else None
+        self.base_model = smp.UnetPlusPlus(
+            encoder_name=encoder_name,
+            encoder_weights=encoder_weights,
+            in_channels=3,
+            classes=1,
+        )
+        # We'll use the encoder and decoder separately
+        self.encoder = self.base_model.encoder
+        self.decoder = self.base_model.decoder
+        self.segmentation_head = self.base_model.segmentation_head
+    def forward(self, x1: torch.Tensor, x2: torch.Tensor) -> torch.Tensor:
+        """Forward pass.
+        Args:
+            x1: Before image [B, 3, 256, 256].
+            x2: After image [B, 3, 256, 256].
+        Returns:
+            Raw logits [B, 1, 256, 256].
+        """
+        # Extract multi-scale features from both images
+        features_1 = self.encoder(x1)
+        features_2 = self.encoder(x2)
+        # Compute absolute difference at each scale
+        diff_features = [torch.abs(f1 - f2) for f1, f2 in zip(features_1, features_2)]
+        # Decode
+        decoder_output = self.decoder(*diff_features)
+        out = self.segmentation_head(decoder_output)
+        return out
+if __name__ == "__main__":
+    # Quick sanity check
+    model = UNetPPChangeDetection(pretrained=False)
+    x1 = torch.randn(2, 3, 256, 256)
+    x2 = torch.randn(2, 3, 256, 256)
+    out = model(x1, x2)
+    print(f"Input: {x1.shape}, Output: {out.shape}")
+    assert out.shape == (2, 1, 256, 256), f"Unexpected shape: {out.shape}"

requirements.txt ADDED Viewed

	@@ -0,0 +1,16 @@

+torch==2.1.2
+torchvision==0.16.2
+segmentation-models-pytorch==0.3.3
+timm==0.9.12
+einops==0.7.0
+albumentations==1.3.1
+opencv-python-headless==4.9.0.80
+scikit-learn==1.4.0
+matplotlib==3.8.2
+numpy==1.26.3
+Pillow==10.2.0
+PyYAML==6.0.1
+tqdm==4.66.1
+tensorboard==2.15.1
+gradio==4.14.0
+gdown==5.1.0

setup_colab.py ADDED Viewed

	@@ -0,0 +1,172 @@

+"""Google Colab setup script.
+Handles Drive mounting, GPU verification, dependency installation,
+and path configuration. Run this at the start of every Colab session.
+Usage (in Colab cell):
+    !python setup_colab.py
+    # Or import and call:
+    from setup_colab import setup
+    setup()
+"""
+import logging
+import os
+import subprocess
+import sys
+from pathlib import Path
+from typing import Dict, Optional
+logger = logging.getLogger(__name__)
+def mount_drive() -> None:
+    """Mount Google Drive at /content/drive.
+    Skips if not running in Colab or already mounted.
+    """
+    if not is_colab():
+        logger.info("Not running in Colab — skipping Drive mount.")
+        return
+    if Path("/content/drive/MyDrive").exists():
+        logger.info("Google Drive already mounted.")
+        return
+    from google.colab import drive
+    drive.mount("/content/drive")
+    logger.info("Google Drive mounted successfully.")
+def is_colab() -> bool:
+    """Check if running inside Google Colab.
+    Returns:
+        True if running in Colab environment.
+    """
+    try:
+        import google.colab  # noqa: F401
+        return True
+    except ImportError:
+        return False
+def check_gpu() -> Optional[str]:
+    """Check GPU availability and print device info.
+    Returns:
+        GPU name string, or None if no GPU available.
+    """
+    import torch
+    if not torch.cuda.is_available():
+        logger.warning("No GPU detected! Training will be very slow.")
+        return None
+    gpu_name = torch.cuda.get_device_name(0)
+    vram_gb = torch.cuda.get_device_properties(0).total_mem / 1e9
+    logger.info("GPU: %s (%.1f GB VRAM)", gpu_name, vram_gb)
+    return gpu_name
+def detect_gpu_type() -> str:
+    """Detect GPU type for batch size selection.
+    Returns:
+        One of 'T4', 'V100', or 'default'.
+    """
+    import torch
+    if not torch.cuda.is_available():
+        return "default"
+    name = torch.cuda.get_device_name(0).upper()
+    if "T4" in name:
+        return "T4"
+    elif "V100" in name:
+        return "V100"
+    return "default"
+def install_requirements() -> None:
+    """Install project dependencies from requirements.txt."""
+    req_path = Path("requirements.txt")
+    if not req_path.exists():
+        logger.warning("requirements.txt not found in %s", Path.cwd())
+        return
+    logger.info("Installing dependencies...")
+    subprocess.check_call([
+        sys.executable, "-m", "pip", "install", "-q", "-r", str(req_path)
+    ])
+    logger.info("Dependencies installed.")
+def create_drive_dirs(drive_root: str = "/content/drive/MyDrive/change-detection") -> Dict[str, Path]:
+    """Create project directories on Google Drive.
+    Args:
+        drive_root: Root directory on Drive for this project.
+    Returns:
+        Dict mapping directory names to their paths.
+    """
+    dirs = {
+        "root": Path(drive_root),
+        "checkpoints": Path(drive_root) / "checkpoints",
+        "logs": Path(drive_root) / "logs",
+        "outputs": Path(drive_root) / "outputs",
+        "data": Path(drive_root) / "processed_data",
+    }
+    for name, path in dirs.items():
+        path.mkdir(parents=True, exist_ok=True)
+        logger.info("  %s: %s", name, path)
+    return dirs
+def setup(
+    drive_root: str = "/content/drive/MyDrive/change-detection",
+    install_deps: bool = True,
+) -> Dict[str, Path]:
+    """Run full Colab setup.
+    Args:
+        drive_root: Root directory on Google Drive.
+        install_deps: Whether to install pip dependencies.
+    Returns:
+        Dict of project directory paths.
+    """
+    logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
+    logger.info("=" * 60)
+    logger.info("Military Base Change Detection — Colab Setup")
+    logger.info("=" * 60)
+    # 1. Mount Drive
+    mount_drive()
+    # 2. Check GPU
+    gpu_name = check_gpu()
+    gpu_type = detect_gpu_type()
+    logger.info("GPU type for batch sizing: %s", gpu_type)
+    # 3. Install dependencies
+    if install_deps:
+        install_requirements()
+    # 4. Create Drive directories
+    logger.info("Creating project directories on Drive...")
+    dirs = create_drive_dirs(drive_root)
+    logger.info("=" * 60)
+    logger.info("Setup complete! Ready to train.")
+    logger.info("=" * 60)
+    return dirs
+if __name__ == "__main__":
+    setup()

train.py ADDED Viewed

	@@ -0,0 +1,418 @@

+"""Main training script for change detection models.
+Supports AMP, gradient clipping, early stopping, checkpoint saving to Google
+Drive, and resume from checkpoint after Colab disconnects.
+Usage:
+    python train.py --config configs/config.yaml --model unet_pp
+    python train.py --config configs/config.yaml --model changeformer --resume checkpoints/changeformer_last.pth
+"""
+import argparse
+import logging
+import random
+from pathlib import Path
+from typing import Any, Dict, Tuple
+import numpy as np
+import torch
+import torch.nn as nn
+from torch.cuda.amp import GradScaler, autocast
+from torch.optim import AdamW
+from torch.optim.lr_scheduler import CosineAnnealingLR
+from torch.utils.data import DataLoader
+from torch.utils.tensorboard import SummaryWriter
+from tqdm import tqdm
+import yaml
+from data.dataset import ChangeDetectionDataset
+from models import get_model
+from utils.losses import get_loss
+from utils.metrics import ConfusionMatrix
+from utils.visualization import plot_prediction
+logger = logging.getLogger(__name__)
+def set_seed(seed: int) -> None:
+    """Set random seeds for reproducibility.
+    Args:
+        seed: Random seed value.
+    """
+    random.seed(seed)
+    np.random.seed(seed)
+    torch.manual_seed(seed)
+    torch.cuda.manual_seed_all(seed)
+    torch.backends.cudnn.deterministic = True
+    torch.backends.cudnn.benchmark = False
+def detect_gpu_type() -> str:
+    """Detect the current GPU type for batch size selection.
+    Returns:
+        GPU type string ('T4', 'V100', or 'default').
+    """
+    if not torch.cuda.is_available():
+        return "default"
+    name = torch.cuda.get_device_name(0).upper()
+    if "T4" in name:
+        return "T4"
+    elif "V100" in name:
+        return "V100"
+    return "default"
+def get_batch_size(config: Dict[str, Any], model_name: str) -> int:
+    """Get appropriate batch size based on GPU and model.
+    Args:
+        config: Full config dict.
+        model_name: Model name string.
+    Returns:
+        Batch size integer.
+    """
+    gpu_type = detect_gpu_type()
+    batch_sizes = config.get("batch_sizes", {}).get(model_name, {})
+    return batch_sizes.get(gpu_type, batch_sizes.get("default", 4))
+def get_paths(config: Dict[str, Any]) -> Dict[str, Path]:
+    """Resolve paths based on whether running on Colab or locally.
+    Args:
+        config: Full config dict.
+    Returns:
+        Dict with keys: 'data', 'checkpoints', 'logs', 'outputs'.
+    """
+    if config.get("colab", {}).get("enabled", False):
+        colab = config["colab"]
+        return {
+            "data": Path(colab["data_dir"]),
+            "checkpoints": Path(colab["checkpoint_dir"]),
+            "logs": Path(colab["log_dir"]),
+            "outputs": Path(colab["output_dir"]),
+        }
+    else:
+        paths = config.get("paths", {})
+        return {
+            "data": Path(paths.get("processed_data", "./processed_data")),
+            "checkpoints": Path(paths.get("checkpoint_dir", "./checkpoints")),
+            "logs": Path(paths.get("log_dir", "./logs")),
+            "outputs": Path(paths.get("output_dir", "./outputs")),
+        }
+def build_dataloaders(
+    config: Dict[str, Any],
+    data_dir: Path,
+    batch_size: int,
+) -> Tuple[DataLoader, DataLoader]:
+    """Create train and validation DataLoaders.
+    Args:
+        config: Full config dict.
+        data_dir: Path to processed dataset root.
+        batch_size: Batch size.
+    Returns:
+        Tuple of (train_loader, val_loader).
+    """
+    ds_cfg = config.get("dataset", {})
+    num_workers = ds_cfg.get("num_workers", 4)
+    pin_memory = ds_cfg.get("pin_memory", True)
+    train_ds = ChangeDetectionDataset(data_dir / "train", split="train", config=config)
+    val_ds = ChangeDetectionDataset(data_dir / "val", split="val", config=config)
+    train_loader = DataLoader(
+        train_ds, batch_size=batch_size, shuffle=True,
+        num_workers=num_workers, pin_memory=pin_memory, drop_last=True,
+    )
+    val_loader = DataLoader(
+        val_ds, batch_size=batch_size, shuffle=False,
+        num_workers=num_workers, pin_memory=pin_memory,
+    )
+    return train_loader, val_loader
+def train_one_epoch(
+    model: nn.Module,
+    loader: DataLoader,
+    criterion: nn.Module,
+    optimizer: torch.optim.Optimizer,
+    scaler: GradScaler,
+    device: torch.device,
+    config: Dict[str, Any],
+) -> Tuple[float, Dict[str, float]]:
+    """Run one training epoch.
+    Args:
+        model: The change detection model.
+        loader: Training DataLoader.
+        criterion: Loss function.
+        optimizer: Optimizer.
+        scaler: GradScaler for AMP.
+        device: Target device.
+        config: Full config dict.
+    Returns:
+        Tuple of (average loss, metrics dict).
+    """
+    model.train()
+    running_loss = 0.0
+    cm = ConfusionMatrix()
+    train_cfg = config.get("training", {})
+    accum_steps = train_cfg.get("gradient_accumulation_steps", 1)
+    grad_clip = train_cfg.get("grad_clip_max_norm", 1.0)
+    threshold = config.get("evaluation", {}).get("threshold", 0.5)
+    optimizer.zero_grad()
+    for step, batch in enumerate(tqdm(loader, desc="Train", leave=False)):
+        img_a = batch["A"].to(device)
+        img_b = batch["B"].to(device)
+        mask = batch["mask"].to(device)
+        with autocast():
+            logits = model(img_a, img_b)
+            loss = criterion(logits, mask) / accum_steps
+        scaler.scale(loss).backward()
+        if (step + 1) % accum_steps == 0:
+            scaler.unscale_(optimizer)
+            nn.utils.clip_grad_norm_(model.parameters(), grad_clip)
+            scaler.step(optimizer)
+            scaler.update()
+            optimizer.zero_grad()
+        running_loss += loss.item() * accum_steps
+        # Metrics
+        with torch.no_grad():
+            preds = (torch.sigmoid(logits) > threshold).float()
+            cm.update(preds, mask)
+    avg_loss = running_loss / len(loader)
+    metrics = cm.compute()
+    return avg_loss, metrics
+@torch.no_grad()
+def validate(
+    model: nn.Module,
+    loader: DataLoader,
+    criterion: nn.Module,
+    device: torch.device,
+    threshold: float = 0.5,
+) -> Tuple[float, Dict[str, float]]:
+    """Run validation.
+    Args:
+        model: The change detection model.
+        loader: Validation DataLoader.
+        criterion: Loss function.
+        device: Target device.
+        threshold: Binarization threshold.
+    Returns:
+        Tuple of (average loss, metrics dict).
+    """
+    model.eval()
+    running_loss = 0.0
+    cm = ConfusionMatrix()
+    for batch in tqdm(loader, desc="Val", leave=False):
+        img_a = batch["A"].to(device)
+        img_b = batch["B"].to(device)
+        mask = batch["mask"].to(device)
+        logits = model(img_a, img_b)
+        loss = criterion(logits, mask)
+        running_loss += loss.item()
+        preds = (torch.sigmoid(logits) > threshold).float()
+        cm.update(preds, mask)
+    avg_loss = running_loss / len(loader)
+    metrics = cm.compute()
+    return avg_loss, metrics
+def save_checkpoint(
+    model: nn.Module,
+    optimizer: torch.optim.Optimizer,
+    scheduler: Any,
+    scaler: GradScaler,
+    epoch: int,
+    best_f1: float,
+    save_path: Path,
+) -> None:
+    """Save a training checkpoint.
+    Args:
+        model: Model to save.
+        optimizer: Optimizer state.
+        scheduler: LR scheduler state.
+        scaler: GradScaler state.
+        epoch: Current epoch number.
+        best_f1: Best validation F1 so far.
+        save_path: Path to save the checkpoint.
+    """
+    save_path.parent.mkdir(parents=True, exist_ok=True)
+    torch.save({
+        "epoch": epoch,
+        "model_state_dict": model.state_dict(),
+        "optimizer_state_dict": optimizer.state_dict(),
+        "scheduler_state_dict": scheduler.state_dict(),
+        "scaler_state_dict": scaler.state_dict(),
+        "best_f1": best_f1,
+    }, save_path)
+    logger.info("Saved checkpoint: %s", save_path)
+def load_checkpoint(
+    path: Path,
+    model: nn.Module,
+    optimizer: torch.optim.Optimizer,
+    scheduler: Any,
+    scaler: GradScaler,
+    device: torch.device,
+) -> Tuple[int, float]:
+    """Load a training checkpoint for resume.
+    Args:
+        path: Path to the checkpoint file.
+        model: Model to load weights into.
+        optimizer: Optimizer to load state into.
+        scheduler: Scheduler to load state into.
+        scaler: GradScaler to load state into.
+        device: Target device.
+    Returns:
+        Tuple of (start_epoch, best_f1).
+    """
+    ckpt = torch.load(path, map_location=device)
+    model.load_state_dict(ckpt["model_state_dict"])
+    optimizer.load_state_dict(ckpt["optimizer_state_dict"])
+    scheduler.load_state_dict(ckpt["scheduler_state_dict"])
+    scaler.load_state_dict(ckpt["scaler_state_dict"])
+    logger.info("Resumed from epoch %d (best F1: %.4f)", ckpt["epoch"], ckpt["best_f1"])
+    return ckpt["epoch"], ckpt["best_f1"]
+def main() -> None:
+    """Main training entry point."""
+    parser = argparse.ArgumentParser(description="Train change detection model")
+    parser.add_argument("--config", type=Path, default=Path("configs/config.yaml"))
+    parser.add_argument("--model", type=str, default=None, help="Override model name from config")
+    parser.add_argument("--resume", type=Path, default=None, help="Path to checkpoint for resume")
+    args = parser.parse_args()
+    logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
+    # Load config
+    with open(args.config, "r") as f:
+        config = yaml.safe_load(f)
+    model_name = args.model or config["model"]["name"]
+    seed = config.get("project", {}).get("seed", 42)
+    set_seed(seed)
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    logger.info("Device: %s", device)
+    # Resolve paths
+    paths = get_paths(config)
+    for p in paths.values():
+        p.mkdir(parents=True, exist_ok=True)
+    # Model
+    model = get_model(model_name, config).to(device)
+    logger.info("Model: %s (%.1fM params)", model_name,
+                sum(p.numel() for p in model.parameters()) / 1e6)
+    # Data
+    batch_size = get_batch_size(config, model_name)
+    train_loader, val_loader = build_dataloaders(config, paths["data"], batch_size)
+    # Loss, optimizer, scheduler
+    criterion = get_loss(config)
+    lr = config.get("learning_rates", {}).get(model_name, config["training"]["learning_rate"])
+    epochs = config.get("epoch_counts", {}).get(model_name, config["training"]["epochs"])
+    optimizer = AdamW(model.parameters(), lr=lr, weight_decay=config["training"]["weight_decay"])
+    scheduler = CosineAnnealingLR(optimizer, T_max=epochs)
+    scaler = GradScaler()
+    # TensorBoard
+    writer = SummaryWriter(log_dir=str(paths["logs"] / model_name))
+    # Resume
+    start_epoch = 0
+    best_f1 = 0.0
+    if args.resume and args.resume.exists():
+        start_epoch, best_f1 = load_checkpoint(
+            args.resume, model, optimizer, scheduler, scaler, device
+        )
+    # Early stopping state
+    es_cfg = config["training"]["early_stopping"]
+    patience = es_cfg.get("patience", 15)
+    patience_counter = 0
+    threshold = config.get("evaluation", {}).get("threshold", 0.5)
+    # Training loop
+    for epoch in range(start_epoch, epochs):
+        logger.info("Epoch %d/%d", epoch + 1, epochs)
+        train_loss, train_metrics = train_one_epoch(
+            model, train_loader, criterion, optimizer, scaler, device, config
+        )
+        val_loss, val_metrics = validate(model, val_loader, criterion, device, threshold)
+        scheduler.step()
+        # Log
+        writer.add_scalar("Loss/train", train_loss, epoch)
+        writer.add_scalar("Loss/val", val_loss, epoch)
+        for k, v in val_metrics.items():
+            writer.add_scalar(f"Val/{k}", v, epoch)
+        logger.info(
+            "  Train Loss: %.4f | Val Loss: %.4f | Val F1: %.4f | Val IoU: %.4f",
+            train_loss, val_loss, val_metrics["f1"], val_metrics["iou"],
+        )
+        # Save last checkpoint (always)
+        save_checkpoint(
+            model, optimizer, scheduler, scaler, epoch + 1, best_f1,
+            paths["checkpoints"] / f"{model_name}_last.pth",
+        )
+        # Save best checkpoint
+        if val_metrics["f1"] > best_f1:
+            best_f1 = val_metrics["f1"]
+            patience_counter = 0
+            save_checkpoint(
+                model, optimizer, scheduler, scaler, epoch + 1, best_f1,
+                paths["checkpoints"] / f"{model_name}_best.pth",
+            )
+            logger.info("  New best F1: %.4f", best_f1)
+        else:
+            patience_counter += 1
+        # Early stopping
+        if es_cfg.get("enabled", True) and patience_counter >= patience:
+            logger.info("Early stopping triggered at epoch %d", epoch + 1)
+            break
+    writer.close()
+    logger.info("Training complete. Best F1: %.4f", best_f1)
+if __name__ == "__main__":
+    main()

utils/__init__.py ADDED Viewed

File without changes

utils/losses.py ADDED Viewed

	@@ -0,0 +1,139 @@

+"""Loss functions for binary change detection.
+Provides BCEDiceLoss (default) and FocalLoss, both operating on raw logits.
+A factory function ``get_loss`` reads the project config and returns the
+selected loss module.
+"""
+from typing import Any, Dict
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+class BCEDiceLoss(nn.Module):
+    """Combined Binary Cross-Entropy and Dice Loss.
+    Both components operate on raw logits — sigmoid is applied internally so
+    the caller should **not** pre-apply it.
+    Args:
+        bce_weight: Scalar weight for the BCE component.
+        dice_weight: Scalar weight for the Dice component.
+        smooth: Smoothing constant for Dice to avoid division by zero.
+    """
+    def __init__(
+        self,
+        bce_weight: float = 0.5,
+        dice_weight: float = 0.5,
+        smooth: float = 1.0,
+    ) -> None:
+        super().__init__()
+        self.bce_weight = bce_weight
+        self.dice_weight = dice_weight
+        self.smooth = smooth
+    def forward(self, logits: torch.Tensor, targets: torch.Tensor) -> torch.Tensor:
+        """Compute the combined BCE + Dice loss.
+        Args:
+            logits: Raw model output of shape ``[B, 1, H, W]``.
+            targets: Binary ground-truth masks of shape ``[B, 1, H, W]``
+                with values in {0, 1}.
+        Returns:
+            Scalar loss tensor on the same device as the inputs.
+        """
+        # --- BCE component (numerically stable, operates on logits) ---
+        bce_loss = F.binary_cross_entropy_with_logits(logits, targets)
+        # --- Dice component ---
+        probs = torch.sigmoid(logits)
+        # Flatten spatial dims per sample for stable dice computation
+        probs_flat = probs.view(probs.size(0), -1)
+        targets_flat = targets.view(targets.size(0), -1)
+        intersection = (probs_flat * targets_flat).sum(dim=1)
+        union = probs_flat.sum(dim=1) + targets_flat.sum(dim=1)
+        dice_score = (2.0 * intersection + self.smooth) / (union + self.smooth)
+        dice_loss = 1.0 - dice_score.mean()
+        return self.bce_weight * bce_loss + self.dice_weight * dice_loss
+class FocalLoss(nn.Module):
+    """Focal Loss for addressing class imbalance in change detection.
+    Down-weights well-classified (easy) pixels so the model focuses on hard
+    examples near the decision boundary.  Operates on raw logits.
+    Args:
+        alpha: Balancing factor for the positive class (1 − alpha for negative).
+        gamma: Focusing exponent — higher values down-weight easy examples more.
+    """
+    def __init__(self, alpha: float = 0.25, gamma: float = 2.0) -> None:
+        super().__init__()
+        self.alpha = alpha
+        self.gamma = gamma
+    def forward(self, logits: torch.Tensor, targets: torch.Tensor) -> torch.Tensor:
+        """Compute focal loss.
+        Args:
+            logits: Raw model output of shape ``[B, 1, H, W]``.
+            targets: Binary ground-truth masks of shape ``[B, 1, H, W]``
+                with values in {0, 1}.
+        Returns:
+            Scalar loss tensor on the same device as the inputs.
+        """
+        # Per-pixel BCE (unreduced)
+        bce = F.binary_cross_entropy_with_logits(logits, targets, reduction="none")
+        probs = torch.sigmoid(logits)
+        # p_t = probability of the true class
+        p_t = probs * targets + (1.0 - probs) * (1.0 - targets)
+        # alpha_t = alpha for positives, (1-alpha) for negatives
+        alpha_t = self.alpha * targets + (1.0 - self.alpha) * (1.0 - targets)
+        focal_weight = alpha_t * (1.0 - p_t) ** self.gamma
+        return (focal_weight * bce).mean()
+def get_loss(config: Dict[str, Any]) -> nn.Module:
+    """Factory function — instantiate a loss module from the project config.
+    Reads ``config["loss"]["name"]`` to select the loss type and extracts
+    the matching sub-key for constructor arguments.
+    Args:
+        config: Full project config dict (as loaded from ``config.yaml``).
+    Returns:
+        An ``nn.Module`` loss function ready for ``loss(logits, targets)``.
+    Raises:
+        ValueError: If the requested loss name is not recognised.
+    """
+    loss_cfg = config.get("loss", {})
+    name = loss_cfg.get("name", "bce_dice")
+    if name == "bce_dice":
+        params = loss_cfg.get("bce_dice", {})
+        return BCEDiceLoss(
+            bce_weight=params.get("bce_weight", 0.5),
+            dice_weight=params.get("dice_weight", 0.5),
+        )
+    elif name == "focal":
+        params = loss_cfg.get("focal", {})
+        return FocalLoss(
+            alpha=params.get("alpha", 0.25),
+            gamma=params.get("gamma", 2.0),
+        )
+    else:
+        raise ValueError(
+            f"Unknown loss '{name}'. Choose from: bce_dice, focal"
+        )

utils/metrics.py ADDED Viewed

	@@ -0,0 +1,226 @@

+"""Evaluation metrics for binary change detection.
+Provides a ``ConfusionMatrix`` accumulator, standalone metric functions, and a
+high-level ``MetricTracker`` that accepts raw logits and handles sigmoid +
+thresholding internally.
+All tensor operations stay on GPU until the final ``.item()`` call inside
+``compute()`` so there is no unnecessary device transfer during the hot loop.
+"""
+from typing import Dict
+import torch
+# Small constant to prevent division-by-zero in metric formulas.
+_EPS: float = 1e-7
+# ---------------------------------------------------------------------------
+# Low-level confusion-matrix accumulator
+# ---------------------------------------------------------------------------
+class ConfusionMatrix:
+    """Accumulates TP / FP / FN / TN counts across batches.
+    Counts are kept as plain Python ints (moved off GPU via a single
+    ``.item()`` per update call) so that accumulated values never overflow
+    a GPU scalar.
+    Example::
+        cm = ConfusionMatrix()
+        for preds, targets in loader:
+            cm.update(preds, targets)
+        metrics = cm.compute()
+    """
+    def __init__(self) -> None:
+        self.reset()
+    def reset(self) -> None:
+        """Reset all counters to zero."""
+        self.tp: int = 0
+        self.fp: int = 0
+        self.fn: int = 0
+        self.tn: int = 0
+    def update(self, preds: torch.Tensor, targets: torch.Tensor) -> None:
+        """Accumulate one batch of binary predictions.
+        All boolean logic runs on whatever device the tensors live on; only
+        the four resulting scalars are moved to CPU via ``.item()``.
+        Args:
+            preds: Binary predictions ``[B, 1, H, W]`` with values in {0, 1}.
+            targets: Ground-truth masks ``[B, 1, H, W]`` with values in {0, 1}.
+        """
+        p = preds.bool().flatten()
+        t = targets.bool().flatten()
+        self.tp += (p & t).sum().item()
+        self.fp += (p & ~t).sum().item()
+        self.fn += (~p & t).sum().item()
+        self.tn += (~p & ~t).sum().item()
+    def compute(self) -> Dict[str, float]:
+        """Derive all metrics from the accumulated counts.
+        Returns:
+            Dict with keys ``'f1'``, ``'iou'``, ``'precision'``, ``'recall'``,
+            ``'oa'`` — each a plain Python float.
+        """
+        precision = self.tp / (self.tp + self.fp + _EPS)
+        recall = self.tp / (self.tp + self.fn + _EPS)
+        f1 = 2.0 * precision * recall / (precision + recall + _EPS)
+        iou = self.tp / (self.tp + self.fp + self.fn + _EPS)
+        oa = (self.tp + self.tn) / (self.tp + self.fp + self.fn + self.tn + _EPS)
+        return {
+            "f1": f1,
+            "iou": iou,
+            "precision": precision,
+            "recall": recall,
+            "oa": oa,
+        }
+# ---------------------------------------------------------------------------
+# Standalone convenience functions (single-batch, binary inputs)
+# ---------------------------------------------------------------------------
+def _quick_cm(preds: torch.Tensor, targets: torch.Tensor) -> ConfusionMatrix:
+    """Create and populate a ConfusionMatrix from a single batch.
+    Args:
+        preds: Binary predictions ``[B, 1, H, W]``.
+        targets: Ground-truth masks ``[B, 1, H, W]``.
+    Returns:
+        Populated ``ConfusionMatrix`` instance.
+    """
+    cm = ConfusionMatrix()
+    cm.update(preds, targets)
+    return cm
+def compute_f1(preds: torch.Tensor, targets: torch.Tensor) -> float:
+    """Compute F1 score for a single batch.
+    Args:
+        preds: Binary predictions ``[B, 1, H, W]``.
+        targets: Ground-truth masks ``[B, 1, H, W]``.
+    Returns:
+        F1 score as a float in [0, 1].
+    """
+    return _quick_cm(preds, targets).compute()["f1"]
+def compute_iou(preds: torch.Tensor, targets: torch.Tensor) -> float:
+    """Compute IoU (Jaccard index) for a single batch.
+    Args:
+        preds: Binary predictions ``[B, 1, H, W]``.
+        targets: Ground-truth masks ``[B, 1, H, W]``.
+    Returns:
+        IoU score as a float in [0, 1].
+    """
+    return _quick_cm(preds, targets).compute()["iou"]
+def compute_precision(preds: torch.Tensor, targets: torch.Tensor) -> float:
+    """Compute precision for a single batch.
+    Args:
+        preds: Binary predictions ``[B, 1, H, W]``.
+        targets: Ground-truth masks ``[B, 1, H, W]``.
+    Returns:
+        Precision score as a float in [0, 1].
+    """
+    return _quick_cm(preds, targets).compute()["precision"]
+def compute_recall(preds: torch.Tensor, targets: torch.Tensor) -> float:
+    """Compute recall for a single batch.
+    Args:
+        preds: Binary predictions ``[B, 1, H, W]``.
+        targets: Ground-truth masks ``[B, 1, H, W]``.
+    Returns:
+        Recall score as a float in [0, 1].
+    """
+    return _quick_cm(preds, targets).compute()["recall"]
+def compute_oa(preds: torch.Tensor, targets: torch.Tensor) -> float:
+    """Compute overall accuracy for a single batch.
+    Args:
+        preds: Binary predictions ``[B, 1, H, W]``.
+        targets: Ground-truth masks ``[B, 1, H, W]``.
+    Returns:
+        Overall accuracy as a float in [0, 1].
+    """
+    return _quick_cm(preds, targets).compute()["oa"]
+# ---------------------------------------------------------------------------
+# High-level tracker (accepts raw logits)
+# ---------------------------------------------------------------------------
+class MetricTracker:
+    """End-to-end metric tracker for training / validation loops.
+    Wraps a ``ConfusionMatrix`` and transparently applies sigmoid +
+    thresholding to raw model logits before accumulating counts.
+    Args:
+        threshold: Decision threshold applied after sigmoid (default 0.5).
+    Example::
+        tracker = MetricTracker(threshold=0.5)
+        for batch in val_loader:
+            logits = model(batch["A"], batch["B"])
+            tracker.update(logits, batch["mask"])
+        results = tracker.compute()   # {"f1": ..., "iou": ..., ...}
+        tracker.reset()
+    """
+    def __init__(self, threshold: float = 0.5) -> None:
+        self.threshold = threshold
+        self.cm = ConfusionMatrix()
+    def reset(self) -> None:
+        """Reset the internal confusion matrix."""
+        self.cm.reset()
+    @torch.no_grad()
+    def update(self, logits: torch.Tensor, targets: torch.Tensor) -> None:
+        """Apply sigmoid + threshold and accumulate counts.
+        This method is wrapped with ``@torch.no_grad()`` so it can be
+        called safely inside a validation loop without affecting autograd.
+        All operations run on the input tensor's device.
+        Args:
+            logits: Raw model output ``[B, 1, H, W]`` (pre-sigmoid).
+            targets: Binary ground-truth masks ``[B, 1, H, W]`` with
+                values in {0, 1}.
+        """
+        preds = (torch.sigmoid(logits) >= self.threshold).float()
+        self.cm.update(preds, targets)
+    def compute(self) -> Dict[str, float]:
+        """Compute all metrics from accumulated counts.
+        Returns:
+            Dict with keys ``'f1'``, ``'iou'``, ``'precision'``, ``'recall'``,
+            ``'oa'``.
+        """
+        return self.cm.compute()

utils/visualization.py ADDED Viewed

	@@ -0,0 +1,141 @@

+"""Visualization utilities for change detection results.
+Provides functions to plot predictions, overlay change maps, and track
+training metrics over time.
+"""
+from pathlib import Path
+from typing import Dict, List, Optional
+import matplotlib.pyplot as plt
+import numpy as np
+import torch
+def denormalize(
+    img: np.ndarray,
+    mean: tuple = (0.485, 0.456, 0.406),
+    std: tuple = (0.229, 0.224, 0.225),
+) -> np.ndarray:
+    """Reverse ImageNet normalization for display.
+    Args:
+        img: Normalized image array [H, W, 3].
+        mean: Channel means used for normalization.
+        std: Channel stds used for normalization.
+    Returns:
+        Denormalized image clipped to [0, 1].
+    """
+    img = img * np.array(std) + np.array(mean)
+    return np.clip(img, 0, 1)
+def plot_prediction(
+    img_a: torch.Tensor,
+    img_b: torch.Tensor,
+    mask_gt: torch.Tensor,
+    mask_pred: torch.Tensor,
+    save_path: Optional[Path] = None,
+) -> plt.Figure:
+    """Plot a single change detection prediction.
+    Shows: Before | After | Ground Truth | Prediction in a 1x4 grid.
+    Args:
+        img_a: Before image tensor [3, H, W] (normalized).
+        img_b: After image tensor [3, H, W] (normalized).
+        mask_gt: Ground truth mask [1, H, W] (binary).
+        mask_pred: Predicted mask [1, H, W] (binary or probability).
+        save_path: Optional path to save the figure.
+    Returns:
+        Matplotlib figure.
+    """
+    fig, axes = plt.subplots(1, 4, figsize=(16, 4))
+    # Convert tensors to numpy
+    a = denormalize(img_a.permute(1, 2, 0).cpu().numpy())
+    b = denormalize(img_b.permute(1, 2, 0).cpu().numpy())
+    gt = mask_gt.squeeze(0).cpu().numpy()
+    pred = mask_pred.squeeze(0).cpu().numpy()
+    titles = ["Before (A)", "After (B)", "Ground Truth", "Prediction"]
+    images = [a, b, gt, pred]
+    cmaps = [None, None, "gray", "gray"]
+    for ax, img, title, cmap in zip(axes, images, titles, cmaps):
+        ax.imshow(img, cmap=cmap, vmin=0, vmax=1)
+        ax.set_title(title)
+        ax.axis("off")
+    plt.tight_layout()
+    if save_path is not None:
+        fig.savefig(save_path, dpi=150, bbox_inches="tight")
+    return fig
+def overlay_changes(
+    img_b: torch.Tensor,
+    mask_pred: torch.Tensor,
+    alpha: float = 0.4,
+    color: tuple = (1.0, 0.0, 0.0),
+) -> np.ndarray:
+    """Overlay predicted change mask on the after image.
+    Args:
+        img_b: After image tensor [3, H, W] (normalized).
+        mask_pred: Predicted binary mask [1, H, W].
+        alpha: Overlay transparency.
+        color: RGB color for the overlay (default: red).
+    Returns:
+        Overlaid image as numpy array [H, W, 3].
+    """
+    b = denormalize(img_b.permute(1, 2, 0).cpu().numpy())
+    mask = mask_pred.squeeze(0).cpu().numpy()
+    overlay = b.copy()
+    for c in range(3):
+        overlay[:, :, c] = np.where(
+            mask > 0.5,
+            b[:, :, c] * (1 - alpha) + color[c] * alpha,
+            b[:, :, c],
+        )
+    return overlay
+def plot_metrics_history(
+    history: Dict[str, List[float]],
+    save_path: Optional[Path] = None,
+) -> plt.Figure:
+    """Plot training metric curves over epochs.
+    Args:
+        history: Dict mapping metric names to lists of per-epoch values.
+        save_path: Optional path to save the figure.
+    Returns:
+        Matplotlib figure.
+    """
+    n_metrics = len(history)
+    fig, axes = plt.subplots(1, n_metrics, figsize=(5 * n_metrics, 4))
+    if n_metrics == 1:
+        axes = [axes]
+    for ax, (name, values) in zip(axes, history.items()):
+        ax.plot(values, marker="o", markersize=2)
+        ax.set_title(name)
+        ax.set_xlabel("Epoch")
+        ax.set_ylabel(name)
+        ax.grid(True, alpha=0.3)
+    plt.tight_layout()
+    if save_path is not None:
+        fig.savefig(save_path, dpi=150, bbox_inches="tight")
+    return fig