Instructions to use PUSHPENDAR/segformer-desert with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use PUSHPENDAR/segformer-desert with Transformers:
# Load model directly from transformers import AutoImageProcessor, SegformerForSemanticSegmentation processor = AutoImageProcessor.from_pretrained("PUSHPENDAR/segformer-desert") model = SegformerForSemanticSegmentation.from_pretrained("PUSHPENDAR/segformer-desert") - Notebooks
- Google Colab
- Kaggle
π΅ Desert Semantic Segmentation using SegFormer (MiT-B2)
A SegFormer transformer model fine-tuned on the Offroad Segmentation Training Dataset for 10-class semantic segmentation of desert terrain β built for UGV (Unmanned Ground Vehicle) autonomous navigation in off-road environments.
π§ Model Architecture
| Component | Detail |
|---|---|
| Framework | HuggingFace Transformers |
| Model | SegFormer |
| Backbone | MiT-B2 (nvidia/mit-b2) |
| Parameters | 27,354,314 (all trainable) |
| Decoder | Lightweight MLP Head |
| Classes | 10 |
| Input Size | 512 Γ 512 |
| GPU | NVIDIA A100-PCIE-40GB |
π Dataset Classes (10 Categories)
| Class ID | Raw Mask Value | Label |
|---|---|---|
| 0 | 100 | Trees |
| 1 | 200 | Lush Bushes |
| 2 | 300 | Dry Grass |
| 3 | 500 | Dry Bushes |
| 4 | 550 | Ground Clutter |
| 5 | 600 | Flowers |
| 6 | 700 | Logs |
| 7 | 800 | Rocks |
| 8 | 7100 | Landscape |
| 9 | 10000 | Sky |
π Dataset Statistics
| Split | Samples | Proportion |
|---|---|---|
| Train | 2,142 | 75% |
| Validation | 286 | 10% |
| Test | 429 | 15% |
| Total | 2,857 | β |
- Image resolution: 960 Γ 540 (RGB)
- Mask format: uint16 with raw class value encoding
- Total annotated instances: 16,951
π¨ Augmentation Pipeline
11 augmentations specifically chosen for desert and off-road conditions:
| Augmentation | Purpose |
|---|---|
| Color Jitter | Handles varying sun angles and color temperatures |
| Gamma Change | Simulates over/under-exposed outdoor scenes |
| Gaussian Noise | Robustness to sensor noise in UGV cameras |
| Motion / Gaussian / Median Blur | Motion blur from vehicle movement |
| Random Shadows | Shadows from rocks, vegetation, terrain |
| Random Fog | Dust storms and atmospheric haze |
| Brightness/Contrast | Atmospheric and lighting variations |
| Texture Mixup | Prevents overfitting to specific terrain patterns |
| Horizontal Flip | Improves directional generalization |
| Shift / Scale / Rotate | Spatial robustness |
| Coarse Dropout | Simulates sensor occlusion |
βοΈ Training Configuration
| Parameter | Value |
|---|---|
| Epochs | 50 |
| Batch Size | 8 |
| Learning Rate | 6e-5 |
| Optimizer | AdamW |
| Warmup Steps | 500 |
| Weight Decay | 0.01 |
| FP16 | β Enabled |
| Best Model Metric | mean_iou |
| Eval Strategy | Per epoch |
π Evaluation Results
Evaluated on the validation split (286 images) using COCO-style mean IoU.
| Metric | Value |
|---|---|
| Mean IoU | 0.6529 |
| Mean Accuracy | 0.7592 |
Per-Class IoU
| Class | IoU |
|---|---|
| Trees | 0.8517 |
| Lush Bushes | 0.6990 |
| Dry Grass | 0.7007 |
| Dry Bushes | 0.4873 |
| Ground Clutter | 0.3647 |
| Flowers | 0.7246 |
| Logs | 0.5591 |
| Rocks | 0.4544 |
| Landscape | 0.7014 |
| Sky | 0.9860 |
Best class: Sky (0.9860) β large uniform regions
Hardest class: Ground Clutter (0.3647) β small, heterogeneous objects
βοΈ Inference
from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
from PIL import Image
import torch
import torch.nn.functional as F
# Load model
processor = SegformerImageProcessor.from_pretrained("PUSHPENDAR/desert-segformer")
model = SegformerForSemanticSegmentation.from_pretrained("PUSHPENDAR/desert-segformer")
model.eval()
# Load image
image = Image.open("desert_scene.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="pt")
# Predict
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits # (1, num_classes, H/4, W/4)
# Upsample to original size
upsampled = F.interpolate(
logits,
size=(image.height, image.width),
mode="bilinear",
align_corners=False
)
pred_mask = upsampled.argmax(dim=1)[0].numpy() # (H, W)
print("Predicted class map shape:", pred_mask.shape)
π¦ Repository Files
| File / Folder | Description |
|---|---|
pytorch_model.bin |
Fine-tuned SegFormer weights |
config.json |
Model configuration |
preprocessor_config.json |
Image processor settings |
outputs/validation_metrics.json |
Saved evaluation metrics |
outputs/training_curves.png |
Loss and mIoU training curves |
outputs/test_predictions/ |
Per-image prediction masks |
π Run Locally
git clone https://huggingface.co/PUSHPENDAR/desert-segformer
cd desert-segformer
pip install transformers torch pillow
python app.py
π Citation
If you use this model or dataset, please cite:
@misc{desert-segformer-2025,
title = {Desert Semantic Segmentation with SegFormer (MiT-B2)},
author = {Pushpendar Choudhary},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/PUSHPENDAR/desert-segformer}
}
π License
Apache 2.0 β see LICENSE for details.
- Downloads last month
- 4
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support