π Model Description
AdaReasoner-7B is a vision-language model trained with dynamic tool orchestration capabilities for iterative visual reasoning. This model is AdaReasoner-7B-Randomized.
We provide three variants of AdaReasoner-7B, each optimized for different use cases:
| Model | Description | Hugging Face |
|---|---|---|
| AdaReasoner-7B-Randomized | Trained with the adaptive learning method, enabling strong generalization to unseen tools and tasks. Designed for open-ended and evolving tool environments where adaptability is required. | π€ Link |
| AdaReasoner-7B-Non-Randomized | Trained without adaptive learning, providing more stable and reliable performance on known tools and tasks, but limited generalization to unseen tools or task settings. | π€ Link |
| AdaReasoner-VSP-7B | Task-specialized model trained exclusively on the Visual Spatial Planning (VSP) task, achieving strong performance on VSP benchmarks but not intended for cross-task generalization. | π€ Link |
Key Differences:
- Randomized: Trained with adaptive learning method, enabling zero-shot generalization to novel tools and task configurations
- Non-Randomized: Trained without adaptive learning, offering more predictable behavior on familiar tools but lacking generalization
- VSP-7B: Task-specific model fine-tuned exclusively on Visual Spatial Planning (VSP) benchmarks for optimal performance on navigation tasks
π Quick Start
AdaReasoner-7B can be deployed for single-turn inference using standard inference frameworks such as vLLM. However, AdaReasoner is a tool-planning model whose full capabilities require interaction with an external tool environment. To fully evaluate or utilize its tool-planning behavior, we recommend using AdaEval provided in our repository for batch inference and evaluation, or trying the Demo interface for interactive, single-instance GUI-based reasoning.
π― Capabilities
The model supports a diverse set of visual reasoning tasks, covering both structured reasoning and open-ended visual understanding: - Visual Spatial Planning Navigation and verification tasks based on grid-world environments (VSPO and VSP), evaluating fine-grained spatial perception, multi-step path planning, and safety verification under out-of-distribution map configurations. - Compositional Visual Reasoning (Jigsaw) Image reconstruction from shuffled patches (Jigsaw-COCO and BLINK-J), testing localβglobal consistency, partβwhole reasoning, and visual compositional understanding. - GUI Question Answering (GUIQA) Fine-grained reasoning over GUI screenshots, including interactive webpage understanding (GUIChat) and agent-centric UI reasoning from WebMMU (Agentic Action subset), emphasizing element grounding, action planning, and multi-step inference. - General Visual Question Answering (General VQA) Open-ended visual reasoning beyond structured settings, evaluated on V* and HRBench, focusing on fine-grained visual search, attribute recognition, spatial relationship reasoning, and robustness to high-resolution, complex real-world scenes.
π οΈ Tool Integration
For full tool-augmented inference capabilities, please refer to the AdaReasoner repository which includes:
- Tool Server deployment
- AdaEval evaluation framework
- Complete inference pipeline
π Performance
Please refer to our paper for detailed benchmark results across multiple visual reasoning tasks.
π§ Technical Details
- Base Architecture: Qwen 2.5 VL 7B Instruct
- Training Method: Tool Cold Start (SFT) + Tool GRPO (RL) + Adaptive Learning
- Context Length: Support for extended context with multiple tool interactions
- Modalities: Text + Vision
π Citation
If you use this model in your research, please cite:
@article{adareasoner2024,
title={Dynamic Tool Orchestration for Iterative Visual Reasoning},
author={AdaReasoner Team},
journal={arXiv preprint arXiv:XXXX.XXXXX},
year={2024}
}
π License
Apache 2.0
π€ Acknowledgments
This model is part of the AdaReasoner project. For more information, visit our GitHub repository.
π§ Contact
For questions and feedback, please open an issue in our GitHub repository.
- Downloads last month
- 26