Text Generation
Transformers
Safetensors
English
qwen3
math
reasoning
agent
qwen
grpo
reinforcement-learning
conversational
text-generation-inference
Instructions to use Intel/deepmath-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Intel/deepmath-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Intel/deepmath-v1") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Intel/deepmath-v1") model = AutoModelForCausalLM.from_pretrained("Intel/deepmath-v1") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Intel/deepmath-v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Intel/deepmath-v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Intel/deepmath-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Intel/deepmath-v1
- SGLang
How to use Intel/deepmath-v1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Intel/deepmath-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Intel/deepmath-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Intel/deepmath-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Intel/deepmath-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Intel/deepmath-v1 with Docker Model Runner:
docker model run hf.co/Intel/deepmath-v1
| language: | |
| - en | |
| license: apache-2.0 | |
| tags: | |
| - math | |
| - reasoning | |
| - agent | |
| - qwen | |
| - grpo | |
| - reinforcement-learning | |
| base_model: Qwen/Qwen3-4B-Thinking-2507 | |
| datasets: | |
| - nvidia/OpenMathReasoning | |
| metrics: | |
| - accuracy | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| # DeepMath: A Lightweight Math Reasoning Agent | |
| <img src="https://cdn-uploads.huggingface.co/production/uploads/62d93cd728f9c86a4031562e/ndb_WmPavW1MONAjsGpYT.jpeg" style="width:600px" alt="An LLM is using a calculator to answer questions." /> | |
| ## Model Description | |
| **DeepMath** is a 4B parameter mathematical reasoning model that combines a fine-tuned LLM with a sandboxed Python executor. Built on [Qwen3-4B Thinking](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507) and trained with **GRPO (Group Relative Policy Optimization)**, DeepMath generates concise Python snippets for computational steps instead of verbose text explanations, significantly reducing errors and output length. | |
| - **Developed by:** Intel AI Labs | |
| - **Model type:** Causal language model with agent capabilities | |
| - **Language:** English | |
| - **Base model:** [Qwen3-4B Thinking](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507) | |
| - **License:** Apache 2.0 | |
| - **Blog:**: 🔗 <https://huggingface.co/blog/intel-deepmath> | |
| - **Repository:** 💻 [https://github.com/IntelLabs/DeepMath](https://github.com/IntelLabs/DeepMath) | |
| ## Key Features | |
| ✅ **Code-driven reasoning:** Generates short Python snippets for intermediate computational steps | |
| ✅ **Sandboxed execution:** No file I/O, no network calls, strict timeouts | |
| ✅ **Improved accuracy:** Offloading computation reduces arithmetic errors | |
| ✅ **Reduced verbosity:** Up to 66% shorter outputs compared to baseline | |
| ✅ **Safe and auditable:** Deterministic execution with readable code snippets | |
| ## Model Architecture | |
| DeepMath uses a LoRA adapter fine-tuned on top of Qwen3-4B Thinking with the following components: | |
| - **Agent Interface:** Outputs special tokens for Python code execution during reasoning | |
| - **Executor:** Sandboxed Python environment with allow-listed modules | |
| - **Safety Constraints:** Per-snippet timeouts, no file/network access | |
| - **Training Method:** GRPO with accuracy and code generation rewards | |
| <figure> | |
| <img src="https://cdn-uploads.huggingface.co/production/uploads/62d93cd728f9c86a4031562e/zOcvJ2DY61QZyozarsKbT.png" style="width:400px" alt="Changes to vLLM client and server in TRL library." /> | |
| <figcaption><p><em>Figure 1: The vLLM client and server were modified to use the DeepMath agent in generating the candidates, while using the vLLM backend.</em></p></figcaption> | |
| </figure> | |
| ## Training Details | |
| ### Training Data | |
| - **Dataset:** [OpenMathReasoning](https://huggingface.co/datasets/nvidia/OpenMathReasoning) (tool-usage subset) | |
| - **Note:** GRPO training only uses problems, not solutions | |
| - **In-context Learning:** 4 solved examples demonstrating agent call syntax and patterns | |
| ### Training Procedure | |
| **GRPO (Group Relative Policy Optimization)** fine-tuning with: | |
| - **Accuracy Reward:** +1 for correct answers | |
| - **Code Generation Reward:** +1 for using code snippets (weighted 10:1 vs. accuracy) | |
| - **Length Constraint:** GRPO completions limited to 5k tokens | |
| - **Temperature Scheduling:** Linear schedule from T=1.2 → T=0.7 during training | |
| - **Infrastructure:** Modified TRL library's vLLM client and server | |
| ### Training Infrastructure | |
| - Base inference engine: [vLLM](https://github.com/vllm-project/vllm) | |
| - Agent framework: Based on [SmolAgents](https://github.com/huggingface/smolagents/) | |
| - Training framework: Modified [TRL](https://github.com/huggingface/trl) GRPO trainer | |
| ## Performance | |
| ### Benchmark Results | |
| We evaluated DeepMath on four mathematical reasoning datasets using **majority@16** and mean output length metrics: | |
| <img src="https://cdn-uploads.huggingface.co/production/uploads/62d93cd728f9c86a4031562e/mBuINzNvjDKdZEuIqzJeO.png" style="width:800px" alt="Main results table showing performance across MATH500, AIME, HMMT, and HLE datasets."/> | |
| **Key Findings:** | |
| - **Accuracy:** Improved performance on challenging datasets (AIME, HMMT, HLE) | |
| - **Efficiency:** Up to **66% reduction** in output length | |
| - **Robustness:** Consistent improvements when combining agent + GRPO training | |
| ### Evaluation Datasets | |
| - **MATH500:** Subset of the MATH dataset | |
| - **AIME:** American Invitational Mathematics Examination problems | |
| - **HMMT:** Harvard-MIT Mathematics Tournament problems | |
| - **HLE:** High-level exam problems | |
| <figure> | |
| <img src="https://cdn-uploads.huggingface.co/production/uploads/62d93cd728f9c86a4031562e/a-kn3oHdlxTP_L-63N9LX.png" style="width:700px" alt="Output example showing Python code generation and execution." /> | |
| <figcaption><p><em>Figure 2: Example output where Python code is generated, evaluated, and the result is inserted into the reasoning trace.</em></p></figcaption> | |
| </figure> | |
| ## Usage | |
| ### Installation | |
| ```bash | |
| # Install uv package manager | |
| curl -LsSf https://astral.sh/uv/install.sh | sh | |
| # Clone repository | |
| git clone https://github.com/IntelLabs/DeepMath.git | |
| cd DeepMath | |
| # Install dependencies | |
| uv pip install -r requirements.txt | |
| uv pip install -e . | |
| ``` | |
| ### Basic Inference | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model_name = "Intel/deepmath-v1" | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| model = AutoModelForCausalLM.from_pretrained(model_name) | |
| # Example problem | |
| problem = "What is the sum of the first 100 positive integers?" | |
| inputs = tokenizer(problem, return_tensors="pt") | |
| outputs = model.generate(**inputs, max_new_tokens=3000) | |
| print(tokenizer.decode(outputs[0])) | |
| ``` | |
| ### Inference with Agent | |
| For full agent capabilities with sandboxed Python execution: | |
| ```bash | |
| python inference.py \ | |
| +model.use_vllm=true \ | |
| +model.math_agent=true \ | |
| +model.examples=deep_math/fewshot.txt \ | |
| model.generation.max_new_tokens=3000 \ | |
| +model.max_agent_output=20000 \ | |
| +model.max_steps=50 \ | |
| model.model_name_or_path=Intel/deepmath-v1 \ | |
| hf_tag=HuggingFaceH4/MATH-500 \ | |
| generated_file=output.jsonl | |
| ``` | |
| See the [repository](https://github.com/IntelLabs/DeepMath) for complete usage examples. | |
| ## Limitations and Biases | |
| ### Limitations | |
| - **Scope:** Optimized for mathematical reasoning tasks; may not generalize to other domains | |
| - **Problem Types:** Evaluated on contest-style math problems; performance on open-ended mathematical creativity or formal proofs is unknown | |
| - **Model Size:** 4B parameters may limit reasoning depth on extremely complex problems | |
| - **Code Execution:** Requires sandboxed environment for full agent capabilities | |
| ### Safety Considerations | |
| ⚠️ **Code Execution Risk:** This model generates and executes Python code. While DeepMath uses strict sandboxing and resource limits, any deployment should: | |
| - Carefully manage attack surfaces | |
| - Enforce rate limits | |
| - Use proper isolation (containers, VMs) | |
| - Monitor resource usage | |
| - Validate generated code before execution in production | |
| ### Ethical Considerations | |
| - The model is trained on mathematical problem-solving datasets and should not be used for decision-making in critical applications without human oversight | |
| - Generated code should be reviewed before execution in production environments | |
| - The model may reflect biases present in the training data | |
| ## Citation | |
| If you use DeepMath in your research, please cite: | |
| ```bibtex | |
| @software{deepmath2025, | |
| author = {Fleischer, Daniel and Berchansky, Moshe and Wasserblat, Moshe}, | |
| title = {DeepMath: A Lightweight Math Reasoning Agent for LLMs}, | |
| year = {2025}, | |
| publisher = {Intel AI Labs}, | |
| url = {https://github.com/IntelLabs/DeepMath} | |
| } | |
| ``` | |
| ## Model Card Contact | |
| For questions or issues, please open an issue on the [GitHub repository](https://github.com/IntelLabs/DeepMath). | |