| | --- |
| | library_name: transformers |
| | base_model: |
| | - Qwen/Qwen2.5-VL-7B-Instruct |
| | pipeline_tag: image-text-to-text |
| | license: mit |
| | --- |
| | |
| | # Geo-R1: Unlocking VLM Geospatial Reasoning with Cross-View Reinforcement Learning |
| |
|
| | This repository contains the Geo-R1 model, a reasoning-centric post-training framework that unlocks geospatial reasoning in vision-language models, as introduced in the paper: |
| |
|
| | [**Geo-R1: Unlocking VLM Geospatial Reasoning with Cross-View Reinforcement Learning**](https://huggingface.co/papers/2510.00072) |
| |
|
| | Geo-R1 combines "thinking scaffolding" (supervised fine-tuning on synthetic chain-of-thought exemplars) and an "elevating" stage using GRPO-based reinforcement learning on a weakly-supervised cross-view pairing proxy. This approach enables models to connect visual cues with geographic priors and harness reasoning for accurate prediction, achieving state-of-the-art performance across various geospatial reasoning benchmarks. |