Image-Text-to-Text
Safetensors
English
qwen2_5_vl
agent
conversational
Logo

Dynamic Tool Orchestration for Iterative Visual Reasoning

Paper Docs Data & Model Homepage Demo Video

πŸ“‹ Model Description

AdaReasoner-7B is a vision-language model trained with dynamic tool orchestration capabilities for iterative visual reasoning. This model is AdaReasoner-7B-Randomized.

We provide three variants of AdaReasoner-7B, each optimized for different use cases:

Model Description Hugging Face
AdaReasoner-7B-Randomized Trained with the adaptive learning method, enabling strong generalization to unseen tools and tasks. Designed for open-ended and evolving tool environments where adaptability is required. πŸ€— Link
AdaReasoner-7B-Non-Randomized Trained without adaptive learning, providing more stable and reliable performance on known tools and tasks, but limited generalization to unseen tools or task settings. πŸ€— Link
AdaReasoner-VSP-7B Task-specialized model trained exclusively on the Visual Spatial Planning (VSP) task, achieving strong performance on VSP benchmarks but not intended for cross-task generalization. πŸ€— Link

Key Differences:

  • Randomized: Trained with adaptive learning method, enabling zero-shot generalization to novel tools and task configurations
  • Non-Randomized: Trained without adaptive learning, offering more predictable behavior on familiar tools but lacking generalization
  • VSP-7B: Task-specific model fine-tuned exclusively on Visual Spatial Planning (VSP) benchmarks for optimal performance on navigation tasks

πŸš€ Quick Start

AdaReasoner-7B can be deployed for single-turn inference using standard inference frameworks such as vLLM. However, AdaReasoner is a tool-planning model whose full capabilities require interaction with an external tool environment. To fully evaluate or utilize its tool-planning behavior, we recommend using AdaEval provided in our repository for batch inference and evaluation, or trying the Demo interface for interactive, single-instance GUI-based reasoning.

🎯 Capabilities

The model supports a diverse set of visual reasoning tasks, covering both structured reasoning and open-ended visual understanding: - Visual Spatial Planning Navigation and verification tasks based on grid-world environments (VSPO and VSP), evaluating fine-grained spatial perception, multi-step path planning, and safety verification under out-of-distribution map configurations. - Compositional Visual Reasoning (Jigsaw) Image reconstruction from shuffled patches (Jigsaw-COCO and BLINK-J), testing local–global consistency, part–whole reasoning, and visual compositional understanding. - GUI Question Answering (GUIQA) Fine-grained reasoning over GUI screenshots, including interactive webpage understanding (GUIChat) and agent-centric UI reasoning from WebMMU (Agentic Action subset), emphasizing element grounding, action planning, and multi-step inference. - General Visual Question Answering (General VQA) Open-ended visual reasoning beyond structured settings, evaluated on V* and HRBench, focusing on fine-grained visual search, attribute recognition, spatial relationship reasoning, and robustness to high-resolution, complex real-world scenes.

πŸ› οΈ Tool Integration

For full tool-augmented inference capabilities, please refer to the AdaReasoner repository which includes:

  • Tool Server deployment
  • AdaEval evaluation framework
  • Complete inference pipeline

πŸ“Š Performance

Please refer to our paper for detailed benchmark results across multiple visual reasoning tasks.

πŸ”§ Technical Details

  • Base Architecture: Qwen 2.5 VL 7B Instruct
  • Training Method: Tool Cold Start (SFT) + Tool GRPO (RL) + Adaptive Learning
  • Context Length: Support for extended context with multiple tool interactions
  • Modalities: Text + Vision

πŸ“š Citation

If you use this model in your research, please cite:

@article{adareasoner2024,
  title={Dynamic Tool Orchestration for Iterative Visual Reasoning},
  author={AdaReasoner Team},
  journal={arXiv preprint arXiv:XXXX.XXXXX},
  year={2024}
}

πŸ“„ License

Apache 2.0

🀝 Acknowledgments

This model is part of the AdaReasoner project. For more information, visit our GitHub repository.

πŸ“§ Contact

For questions and feedback, please open an issue in our GitHub repository.

Downloads last month
26
Safetensors
Model size
8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for AdaReasoner/AdaReasoner-7B-Randomized

Finetuned
(958)
this model
Quantizations
3 models

Datasets used to train AdaReasoner/AdaReasoner-7B-Randomized