How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="urbanspr1nter/lfm2.5vl-450m-deforestation-classifier",
	filename="",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

LFM2.5-VL-450M Deforestation Classifier

A fine-tuned variant of LiquidAI/LFM2.5-VL-450M that classifies a 1 km × 1 km Sentinel-2 false-color tile into one of three deforestation classes:

  • STANDING_FOREST — currently forested (≥ 50 % canopy in 2000, no recent loss)
  • RECENTLY_CLEARED — was forested, cleared in the last ~5 years
  • LONG_TERM_NON_FOREST — has not been forest in recent decades (urban, pasture, agriculture, water, bare rock)

Trained on Brazilian Amazon and Cerrado deforestation imagery; evaluated on a held-out Cambodia test split to demonstrate cross-continental generalization. Ships as a 218 MB Q4_K_M GGUF for direct deployment via llama.cpp on commodity edge hardware (validated on an AMD Ryzen 5 5600U mini-PC with Vega 7 iGPU).

Headline result

On a held-out Cambodia test split (996 subtiles, never seen during training), the deployed Q4_K_M GGUF on a $300 mini-PC iGPU beats Claude Sonnet 4.6 (API) by +15.6 pp accuracy and +11.9 pp macro F1, at 6× lower latency:

Eval Accuracy Macro F1 RC F1 RC Recall Latency / sample Hardware
LFM2.5-VL-450M zero-shot 64.4 % 26.1 % 0.000 0.000 0.47 s Vega 7 iGPU
Claude Sonnet 4.6 (API) 71.5 % 54.9 % 0.065 0.583 1.81 s API
This model (Q4_K_M GGUF) 87.0 % 66.8 % 0.207 0.833 0.30 s Vega 7 iGPU

Macro F1 is the right headline metric for this 65/34/1 % imbalanced 3-class task — accuracy alone is misleading because the dominant class (LONG_TERM_NON_FOREST) inflates correctness via majority frequency. RC Recall is the deployment-relevant rare-class metric: the model catches 10 of 12 active-clearing test tiles, vs Claude's 7/12. False positives are cheap (a human-review queue item); false negatives are missed deforestation.

Task

Single-snapshot three-class classification from Sentinel-2 imagery. Input is a 100 × 100 px false-color composite of a 1 km × 1 km region:

  • R channel: NIR (band 8)
  • G channel: Red (band 4)
  • B channel: Green (band 3)

This NIR-leading false-color emphasizes vegetation (NDVI signal lives in NIR vs Red) better than natural-color RGB.

Output is JSON-schema-bound:

{"class_label": "STANDING_FOREST" | "RECENTLY_CLEARED" | "LONG_TERM_NON_FOREST"}

Both llama.cpp's response_format=json_schema and HuggingFace constrained decoding can enforce this directly. Empirically, after fine-tuning the model emits clean JSON without constrained decoding (0 parse failures across 996 test generations).

Dataset

5,729 subtiles labeled from Hansen Global Forest Change v1.12 treecover2000 + lossyear layers, fetched via the SimSat hackathon API.

Split Subtiles Region Recent-window
train 3,973 Brazilian Amazon (Rondônia + Mato Grosso) + Brazilian Cerrado 5 yr
val 760 Bolivian Lowlands 3 yr
test 996 Cambodia 3 yr

Class distribution

Split LTNF RC SF
train 1,842 (46 %) 485 (12 %) 1,646 (41 %)
val 416 (55 %) 77 (10 %) 267 (35 %)
test 644 (65 %) 12 (1.2 %) 340 (34 %)

The training split uses a 5-year recent-loss window (lossyear ∈ {2020-2024}) to densify the rare RC class; val and test use a stricter 3-year window so test results stay aligned with the production-relevant "very recent" definition. This mixed-window design directly addresses the Hansen-derived rare-class undersupply.

Geographic split

The train/val/test split is region-level by geography — no parent location appears in more than one split. This is the only way to test cross-continental generalization without a coordinate-memorization shortcut. The model trained on Brazil + Cerrado biomes generalizes to Cambodian dry-deciduous forest deforestation it has never seen.

Training

Standard supervised fine-tuning with completion-only loss (cross-entropy applied only on the assistant's JSON output, ignoring system + user tokens). 5 epochs, batch 16 × grad-accum 2 = 32 effective, AdamW lr 2e-5, cosine schedule with 5 % warmup, bf16 mixed precision. Single RTX 6000 Ada, 8.6 minute total wall clock.

Per-class inverse-frequency loss weighting:

RECENTLY_CLEARED:     2.28 × (rare class)
STANDING_FOREST:      0.43 ×
LONG_TERM_NON_FOREST: 0.29 ×

Trained without coordinates (include_coords=False). An ablation arm (v1_with_coords from an earlier run on the same dataset) showed lat/lon in the user message either had no effect or slightly hurt performance, mirroring Claude Sonnet 4.6's behavior on the same test set. The image alone carries the discrimination signal.

Files in this repository

  • model.safetensors — full HF-format weights (856 MB, bf16). Use with transformers.AutoModelForImageTextToText.
  • config.json, generation_config.json — model and generation config.
  • chat_template.jinja, processor_config.json, tokenizer.json, tokenizer_config.json — preprocessor (inherited unchanged from the base model).
  • trainer_state.json — per-epoch eval curve (val_loss, val_macro_f1, val_accuracy across 5 epochs).
  • training_args.bin — pickled TrainingArguments for exact reproducibility.
  • gguf/v2_no_coords-Q4_K_M.ggufdeployment artifact, 218.7 MB, Q4_K_M-quantized text LM.
  • gguf/mmproj-v2_no_coords-F16.gguf — vision projector, 180.4 MB, f16 (un-quantized for visual fidelity).

Total deployable footprint: 399 MB.

Usage

System prompt (use this exactly — the model was trained against this string)

You are an expert satellite imaging analyst.

You receive a Sentinel-2 false-color image (NIR → R channel, Red → G, Green → B) of a 1 km × 1 km region of the Earth's surface, plus the geographic coordinates of the region's center.

Classify the region as exactly one of:

- STANDING_FOREST       — currently forested (≥ 50% canopy in 2000, no recent loss)
- RECENTLY_CLEARED      — was forested, cleared in the last ~5 years
- LONG_TERM_NON_FOREST  — has not been forest in recent decades (urban, pasture, agriculture, water, bare rock)

Output strict JSON only: {"class_label": "..."}.
No code fences, no commentary, no extra fields.

User-message content

The user message is multimodal: one image (the 1 km × 1 km false-color tile) plus a short text payload. The deployed model was trained with no coordinates in the text — pass an empty JSON object:

{}

(An ablation arm with {"lat": ..., "lon": ...} showed coords either had no effect or slightly hurt performance, mirroring frontier-model behavior. The image carries the discrimination signal.)

llama.cpp (recommended for edge deployment)

llama-server \
  --model gguf/v2_no_coords-Q4_K_M.gguf \
  --mmproj gguf/mmproj-v2_no_coords-F16.gguf \
  --port 8001 \
  --jinja \
  -ngl 99

Then POST to the OpenAI-compatible endpoint with the system prompt, false-color tile, and request JSON-schema enforcement:

import base64, json, requests

SYSTEM_PROMPT = """You are an expert satellite imaging analyst.

You receive a Sentinel-2 false-color image (NIR → R channel, Red → G, Green → B) of a 1 km × 1 km region of the Earth's surface, plus the geographic coordinates of the region's center.

Classify the region as exactly one of:

- STANDING_FOREST       — currently forested (≥ 50% canopy in 2000, no recent loss)
- RECENTLY_CLEARED      — was forested, cleared in the last ~5 years
- LONG_TERM_NON_FOREST  — has not been forest in recent decades (urban, pasture, agriculture, water, bare rock)

Output strict JSON only: {"class_label": "..."}.
No code fences, no commentary, no extra fields."""

img_b64 = base64.b64encode(open("tile.png", "rb").read()).decode()
body = {
    "model": "v2_no_coords-Q4_K_M",
    "messages": [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": [
            {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{img_b64}"}},
            {"type": "text", "text": "{}"},
        ]},
    ],
    "response_format": {
        "type": "json_schema",
        "schema": {
            "type": "object",
            "required": ["class_label"],
            "properties": {
                "class_label": {
                    "type": "string",
                    "enum": ["STANDING_FOREST", "RECENTLY_CLEARED", "LONG_TERM_NON_FOREST"],
                }
            },
            "additionalProperties": False,
        },
    },
    "temperature": 0.1,
}
r = requests.post("http://localhost:8001/v1/chat/completions", json=body)
result = json.loads(r.json()["choices"][0]["message"]["content"])
print(result["class_label"])

HuggingFace transformers (for further fine-tuning or HF-native inference)

import torch
from PIL import Image
from transformers import AutoModelForImageTextToText, AutoProcessor

SYSTEM_PROMPT = """You are an expert satellite imaging analyst.

You receive a Sentinel-2 false-color image (NIR → R channel, Red → G, Green → B) of a 1 km × 1 km region of the Earth's surface, plus the geographic coordinates of the region's center.

Classify the region as exactly one of:

- STANDING_FOREST       — currently forested (≥ 50% canopy in 2000, no recent loss)
- RECENTLY_CLEARED      — was forested, cleared in the last ~5 years
- LONG_TERM_NON_FOREST  — has not been forest in recent decades (urban, pasture, agriculture, water, bare rock)

Output strict JSON only: {"class_label": "..."}.
No code fences, no commentary, no extra fields."""

model = AutoModelForImageTextToText.from_pretrained(
    "urbanspr1nter/lfm2.5vl-450m-deforestation-classifier",
    dtype="bfloat16",
    device_map="cuda",
)
processor = AutoProcessor.from_pretrained(
    "urbanspr1nter/lfm2.5vl-450m-deforestation-classifier",
)
model.eval()

image = Image.open("tile.png").convert("RGB")
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": "{}"},
    ]},
]
inputs = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_tensors="pt",
    return_dict=True,
).to(model.device)

with torch.no_grad():
    out_ids = model.generate(**inputs, max_new_tokens=32, do_sample=False)
completion = processor.tokenizer.decode(
    out_ids[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True
)
print(completion)  # e.g. '{"class_label": "RECENTLY_CLEARED"}'

Per-class evaluation

Cambodia held-out test, 996 subtiles. Q4_K_M GGUF on Vega 7 iGPU (the deployed configuration):

Class Precision Recall F1 Support
STANDING_FOREST 0.865 0.885 0.875 340
RECENTLY_CLEARED 0.118 0.833 0.207 12
LONG_TERM_NON_FOREST 0.988 0.863 0.921 644
macro avg 0.657 0.860 0.668 996
weighted avg 0.929 0.870 0.890 996

The model catches 10 of 12 RC test tiles at moderate precision. RC precision (0.118) is lower than precision on the easy classes because the model is calibrated to err toward catching active clearing — false positives feed a human-review queue; false negatives are missed deforestation, which is the operational worst case. RC remains the hardest class to score statistically — 12 supports is a measurement noise floor that no test-set evaluation can fully escape.

Limitations

  • Recent-loss window definition. "RECENTLY_CLEARED" means cleared in the last ~5 years (training labels) — Hansen lossyear ∈ {2020, 2021, 2022, 2023, 2024} for capture year 2024. The held-out val/test labels use a stricter 3-year window which is a strict subset of the training window. Models trained for "this season's clearings" only should use a tighter window (and rebuild the dataset accordingly).
  • Subtile size. The model is trained on 1 km × 1 km tiles. Inference on much larger or much smaller crops should slice into 1 km tiles first; the model has no zoom-invariant training signal.
  • Cloud cover. Training data was filtered to ≤ 10 % cloud cover. Inference on cloudy tiles is undefined behavior — recommend gating at the SimSat metadata level (return unclassifiable when cloud_cover exceeds a threshold) rather than asking the model to classify through cloud.
  • Region transfer. Train regions are mechanized-clearing-dominated (Brazilian ranching, soy expansion). Test on Cambodia validates transfer to selective-logging + Economic Land Concessions. Transfer to very different visual styles (palm oil monoculture, charcoal-driven miombo) wasn't tested and may degrade — we attempted Mozambique (charcoal-driven miombo) during dataset construction and found the visual signal didn't survive our 1 km tile gate.
  • Imbalanced test set. Cambodia test has only 12 RC supports. RC F1 is intrinsically noisy at this support level.
  • License. LFM Open License v1.0 (inherited from base model). Includes a non-commercial threshold; use commercial-scale inference only after reviewing Section 5.

Acknowledgements

License

LFM Open License v1.0 — see LICENSE. This is a derivative work of LiquidAI's LFM2.5-VL-450M and inherits its license terms, including the non-commercial / research-purposes scope and the commercial-use threshold defined in Section 5. Attribution to LiquidAI is preserved.

Downloads last month
204
Safetensors
Model size
0.4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for urbanspr1nter/lfm2.5vl-450m-deforestation-classifier

Quantized
(21)
this model