Dynamic Tool Orchestration for Iterative Visual Reasoning

📋 Model Description

AdaReasoner-7B is a vision-language model trained with dynamic tool orchestration capabilities for iterative visual reasoning. This model is AdaReasoner-7B-Randomized.

We provide three variants of AdaReasoner-7B, each optimized for different use cases:

Model	Description	Hugging Face
AdaReasoner-7B-Randomized	Trained with the adaptive learning method, enabling strong generalization to unseen tools and tasks. Designed for open-ended and evolving tool environments where adaptability is required.	🤗 Link
AdaReasoner-7B-Non-Randomized	Trained without adaptive learning, providing more stable and reliable performance on known tools and tasks, but limited generalization to unseen tools or task settings.	🤗 Link
AdaReasoner-VSP-7B	Task-specialized model trained exclusively on the Visual Spatial Planning (VSP) task, achieving strong performance on VSP benchmarks but not intended for cross-task generalization.	🤗 Link

Key Differences:

Randomized: Trained with adaptive learning method, enabling zero-shot generalization to novel tools and task configurations
Non-Randomized: Trained without adaptive learning, offering more predictable behavior on familiar tools but lacking generalization
VSP-7B: Task-specific model fine-tuned exclusively on Visual Spatial Planning (VSP) benchmarks for optimal performance on navigation tasks

🚀 Quick Start

AdaReasoner-7B can be deployed for single-turn inference using standard inference frameworks such as vLLM. However, AdaReasoner is a tool-planning model whose full capabilities require interaction with an external tool environment. To fully evaluate or utilize its tool-planning behavior, we recommend using AdaEval provided in our repository for batch inference and evaluation, or trying the Demo interface for interactive, single-instance GUI-based reasoning.

🎯 Capabilities

The model supports a diverse set of visual reasoning tasks, covering both structured reasoning and open-ended visual understanding: - Visual Spatial Planning Navigation and verification tasks based on grid-world environments (VSPO and VSP), evaluating fine-grained spatial perception, multi-step path planning, and safety verification under out-of-distribution map configurations. - Compositional Visual Reasoning (Jigsaw) Image reconstruction from shuffled patches (Jigsaw-COCO and BLINK-J), testing local–global consistency, part–whole reasoning, and visual compositional understanding. - GUI Question Answering (GUIQA) Fine-grained reasoning over GUI screenshots, including interactive webpage understanding (GUIChat) and agent-centric UI reasoning from WebMMU (Agentic Action subset), emphasizing element grounding, action planning, and multi-step inference. - General Visual Question Answering (General VQA) Open-ended visual reasoning beyond structured settings, evaluated on V* and HRBench, focusing on fine-grained visual search, attribute recognition, spatial relationship reasoning, and robustness to high-resolution, complex real-world scenes.

🛠️ Tool Integration

For full tool-augmented inference capabilities, please refer to the AdaReasoner repository which includes:

Tool Server deployment
AdaEval evaluation framework
Complete inference pipeline

📊 Performance

Please refer to our paper for detailed benchmark results across multiple visual reasoning tasks.

🔧 Technical Details

Base Architecture: Qwen 2.5 VL 7B Instruct
Training Method: Tool Cold Start (SFT) + Tool GRPO (RL) + Adaptive Learning
Context Length: Support for extended context with multiple tool interactions
Modalities: Text + Vision

📚 Citation

If you use this model in your research, please cite:

@article{adareasoner2024,
  title={Dynamic Tool Orchestration for Iterative Visual Reasoning},
  author={AdaReasoner Team},
  journal={arXiv preprint arXiv:XXXX.XXXXX},
  year={2024}
}

📄 License

Apache 2.0

🤝 Acknowledgments

This model is part of the AdaReasoner project. For more information, visit our GitHub repository.

📧 Contact

For questions and feedback, please open an issue in our GitHub repository.

Downloads last month: 26

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for AdaReasoner/AdaReasoner-7B-Randomized

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Finetuned

(958)

this model

Quantizations

3 models

AdaReasoner
/

AdaReasoner-7B-Randomized