Instructions to use ControlLLM/Llama3.1-8B-OpenMath16-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ControlLLM/Llama3.1-8B-OpenMath16-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ControlLLM/Llama3.1-8B-OpenMath16-Instruct")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ControlLLM/Llama3.1-8B-OpenMath16-Instruct", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ControlLLM/Llama3.1-8B-OpenMath16-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ControlLLM/Llama3.1-8B-OpenMath16-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ControlLLM/Llama3.1-8B-OpenMath16-Instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/ControlLLM/Llama3.1-8B-OpenMath16-Instruct
- SGLang
How to use ControlLLM/Llama3.1-8B-OpenMath16-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ControlLLM/Llama3.1-8B-OpenMath16-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ControlLLM/Llama3.1-8B-OpenMath16-Instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ControlLLM/Llama3.1-8B-OpenMath16-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ControlLLM/Llama3.1-8B-OpenMath16-Instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use ControlLLM/Llama3.1-8B-OpenMath16-Instruct with Docker Model Runner:
docker model run hf.co/ControlLLM/Llama3.1-8B-OpenMath16-Instruct
Use Docker
docker model run hf.co/ControlLLM/Llama3.1-8B-OpenMath16-InstructControl-LLM-Llama3.1-8B-Math16
This is a fine-tuned model of Llama-3.1-8B-Instruct for mathematical tasks on OpenMath2 dataset, as described in the paper Control LLM: Controlled Evolution for Intelligence Retention in LLM.
Linked Paper
This model is associated with the paper: Control-LLM.
Linked Open Source code - training, eval and benchmark
This model is associated with the github: Control-LLM.
Evaluation Results
Here is an overview of the evaluation results and findings:
Benchmark Result and Catastrophic Forgetting on OpenMath
The following plot illustrates benchmark result and catastrophic forgetting mitigation on the OpenMath2 dataset.
Alignment Comparison
The plot below highlights the alignment comparison of the model trained with Control LLM and Full Parameter Tuning.
Benchmark Results Table
The table below summarizes evaluation results across mathematical tasks and original capabilities.
| Model | MH | M | G8K | M-Avg | ARC | GPQA | MLU | MLUP | O-Avg | Overall |
|---|---|---|---|---|---|---|---|---|---|---|
| Llama3.1-8B-Inst | 23.7 | 50.9 | 85.6 | 52.1 | 83.4 | 29.9 | 72.4 | 46.7 | 60.5 | 56.3 |
| OpenMath2-Llama3 | 38.4 | 64.1 | 90.3 | 64.3 | 45.8 | 1.3 | 4.5 | 19.5 | 12.9 | 38.6 |
| Full Tune | 38.5 | 63.7 | 90.2 | 63.9 | 58.2 | 1.1 | 7.3 | 23.5 | 16.5 | 40.1 |
| Partial Tune | 36.4 | 61.4 | 89.0 | 61.8 | 66.2 | 6.0 | 25.7 | 30.9 | 29.3 | 45.6 |
| Stack Exp. | 35.6 | 61.0 | 90.8 | 61.8 | 69.3 | 18.8 | 61.8 | 43.1 | 53.3 | 57.6 |
| Hybrid Exp. | 34.4 | 61.1 | 90.1 | 61.5 | 81.8 | 25.9 | 67.2 | 43.9 | 57.1 | 59.3 |
| Control LLM* | 38.1 | 62.7 | 90.4 | 63.2 | 79.7 | 25.2 | 68.1 | 43.6 | 57.2 | 60.2 |
Explanation:
- MH: MathHard
- M: Math
- G8K: GSM8K
- M-Avg: Math - Average across MathHard, Math, and GSM8K
- ARC: ARC benchmark
- GPQA: General knowledge QA
- MLU: MMLU (Massive Multitask Language Understanding)
- MLUP: MMLU Pro
- O-Avg: Orginal Capability - Average across ARC, GPQA, MMLU, and MMLUP
- Overall: Combined average across all tasks
- Downloads last month
- 8
Model tree for ControlLLM/Llama3.1-8B-OpenMath16-Instruct
Base model
meta-llama/Llama-3.1-8BDataset used to train ControlLLM/Llama3.1-8B-OpenMath16-Instruct
Paper for ControlLLM/Llama3.1-8B-OpenMath16-Instruct
Evaluation results
- exact_match,none on Math, Math Hard, GSM8Kself-reported0.633
- exact_match,none (gsm8k_0shot_instruct) on Math, Math Hard, GSM8Kself-reported0.905
- exact_match,none (meta_math_0shot_instruct) on Math, Math Hard, GSM8Kself-reported0.628
- exact_match,none (meta_math_hard_0shot_instruct) on Math, Math Hard, GSM8Kself-reported0.381
- exact_match,strict-match on Llama-3.1-8B-Instruct-evals Datasetself-reported0.572
- exact_match,strict-match (meta_arc_0shot_instruct) on Llama-3.1-8B-Instruct-evals Datasetself-reported0.797
- exact_match,strict-match (meta_gpqa_0shot_cot_instruct) on Llama-3.1-8B-Instruct-evals Datasetself-reported0.252
- exact_match,strict-match (meta_mmlu_0shot_instruct) on Llama-3.1-8B-Instruct-evals Datasetself-reported0.684
- exact_match,strict-match (meta_mmlu_pro_5shot_instruct) on Llama-3.1-8B-Instruct-evals Datasetself-reported0.432


Install from pip and serve model
# Install vLLM from pip: pip install vllm# Start the vLLM server: vllm serve "ControlLLM/Llama3.1-8B-OpenMath16-Instruct"# Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ControlLLM/Llama3.1-8B-OpenMath16-Instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'