Instructions to use ControlLLM/Control-LLM-Llama3.1-8B-OpenCoder8-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ControlLLM/Control-LLM-Llama3.1-8B-OpenCoder8-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ControlLLM/Control-LLM-Llama3.1-8B-OpenCoder8-Instruct")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("ControlLLM/Control-LLM-Llama3.1-8B-OpenCoder8-Instruct", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ControlLLM/Control-LLM-Llama3.1-8B-OpenCoder8-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ControlLLM/Control-LLM-Llama3.1-8B-OpenCoder8-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ControlLLM/Control-LLM-Llama3.1-8B-OpenCoder8-Instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/ControlLLM/Control-LLM-Llama3.1-8B-OpenCoder8-Instruct

SGLang

How to use ControlLLM/Control-LLM-Llama3.1-8B-OpenCoder8-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ControlLLM/Control-LLM-Llama3.1-8B-OpenCoder8-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ControlLLM/Control-LLM-Llama3.1-8B-OpenCoder8-Instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ControlLLM/Control-LLM-Llama3.1-8B-OpenCoder8-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ControlLLM/Control-LLM-Llama3.1-8B-OpenCoder8-Instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use ControlLLM/Control-LLM-Llama3.1-8B-OpenCoder8-Instruct with Docker Model Runner:
```
docker model run hf.co/ControlLLM/Control-LLM-Llama3.1-8B-OpenCoder8-Instruct
```

Control-LLM-Llama3.1-8B-OpenCoder8-Instruct / README.md

hawei

Add missing metadata (#1)

0be5dd6 verified over 1 year ago

preview code

raw

history blame contribute delete

4.85 kB

	---
	license: llama3.1
	datasets:
	- OpenCoder-LLM/opc-sft-stage1
	- OpenCoder-LLM/opc-sft-stage2
	language:
	- en
	base_model:
	- meta-llama/Llama-3.1-8B-Instruct
	model-index:
	- name: Control-LLM-Llama3.1-8B-OpenCoder8
	results:
	- task:
	type: code-evaluation
	dataset:
	type: mixed
	name: Code Evaluation Dataset
	metrics:
	- name: pass_at_1,n=1 (code_instruct)
	type: pass_at_1
	value: 0.770508826583593
	stderr: 0.013547264970313243
	verified: false
	- name: pass_at_1,n=1 (humaneval_greedy_instruct)
	type: pass_at_1
	value: 0.823170731707317
	stderr: 0.029883277857485988
	verified: false
	- name: pass_at_1,n=1 (humaneval_plus_greedy_instruct)
	type: pass_at_1
	value: 0.7621951219512195
	stderr: 0.033346454086653404
	verified: false
	- name: pass_at_1,n=1 (mbpp_plus_0shot_instruct)
	type: pass_at_1
	value: 0.7751322751322751
	stderr: 0.02150209607822914
	verified: false
	- name: pass_at_1,n=1 (mbpp_sanitized_0shot_instruct)
	type: pass_at_1
	value: 0.7354085603112841
	stderr: 0.027569713464529938
	verified: false
	- task:
	type: original-capability
	dataset:
	type: meta/Llama-3.1-8B-Instruct-evals
	name: Llama-3.1-8B-Instruct-evals Dataset
	dataset_path: "meta-llama/llama-3.1-8_b-instruct-evals"
	dataset_name: "Llama-3.1-8B-Instruct-evals__arc_challenge__details"
	metrics:
	- name: exact_match,strict-match (original_capability_instruct)
	type: exact_match
	value: 0.5599378769819771
	stderr: 0.0028491774433443513
	verified: false
	- name: exact_match,strict-match (meta_arc_0shot_instruct)
	type: exact_match
	value: 0.8094420600858369
	stderr: 0.011511446994122106
	verified: false
	- name: exact_match,strict-match (meta_gpqa_0shot_cot_instruct)
	type: exact_match
	value: 0.32589285714285715
	stderr: 0.02216910313464341
	verified: false
	- name: exact_match,strict-match (meta_mmlu_0shot_instruct)
	type: exact_match
	value: 0.681241988320752
	stderr: 0.003932622311434926
	verified: false
	- name: exact_match,strict-match (meta_mmlu_pro_5shot_instruct)
	type: exact_match
	value: 0.4029255319148936
	stderr: 0.004471732136513382
	verified: false
	pipeline_tag: text-generation
	library_name: transformers
	---

	# Control-LLM-Llama3.1-8B-OpenCoder8
	This is a fine-tuned model of Llama-3.1-8B-Instruct for coding tasks on OpenCoder SFT dataset described in the paper: [Control LLM: Controlled Evolution for Intelligence Retention in LLM](https://huggingface.co/papers/2501.10979).

	Code: https://github.com/linkedin/ControlLLM.

	## Linked Open Source code - training, eval and benchmark
	This model is associated with the github: [Control-LLM](https://github.com/linkedin/ControlLLM).

	## Evaluation Results
	Here is an overview of the evaluation results and findings:

	### Hybrid Expansion on OpenCoder
	The following diagram illustrates how hybrid expansion works.

	![Catastrophic Forgetting](plots/control_llm_structure_analysis.png)

	### Benchmark Results Table
	The table below summarizes evaluation results across coding tasks and original capabilities.

	\| Model \| MB+ \| MS \| HE+ \| HE \| C-Avg \| ARC \| GP \| MLU \| MLUP \| O-Avg \| Overall \|
	\|--------------------\|---------\|---------\|---------\|---------\|-----------\|---------\|---------\|---------\|----------\|-----------\|-------------\|
	\| Llama3.1-8B-Ins \| 70.4 \| 67.7 \| 66.5 \| 70.7 \| 69.1 \| 83.4 \| 29.9 \| 72.4 \| 46.7 \| 60.5 \| 64.8 \|
	\| OpenCoder-8B-Ins \| 81.2 \| 76.3 \| 78.0 \| 82.3 \| 79.5 \| 8.2 \| 25.4 \| 37.4 \| 11.3 \| 24.6 \| 52.1 \|
	\| Full Param Tune \| 75.1 \| 69.6 \| 71.3 \| 76.8 \| 73.3 \| 24.4 \| 21.9 \| 43.0 \| 19.2 \| 31.5 \| 52.4 \|
	\| Partial Param Tune \| 75.7 \| 71.6 \| 74.4 \| 79.3 \| 75.0 \| 70.2 \| 28.1 \| 60.7 \| 32.4 \| 48.3 \| 61.7 \|
	\| Stack Expansion \| 77.2 \| 72.8 \| 73.2 \| 78.7 \| 75.6 \| 80.0 \| 26.3 \| 66.6 \| 38.2 \| 54.2 \| 64.9 \|
	\| ControlLLM-Hybrid \| 77.5 \| 73.5 \| 76.2\| 82.3\| 77.1 \| 80.9 \| 32.6\| 68.1 \| 40.3 \| 56.0 \| 66.6 \|

	---

	### Explanation:
	- MB+: MBPP Plus
	- MS: MBPP Sanitized
	- HE+: HumanEval Plus
	- HE: HumanEval
	- C-Avg: Coding - Size Weighted Average across MB+, MS, HE+, and HE
	- ARC: ARC benchmark
	- GP: GPQA benchmark
	- MLU: MMLU (Massive Multitask Language Understanding)
	- MLUP: MMLU Pro
	- O-Avg: Original Capability - Size Weighted Average across ARC, GPQA, MMLU, and MMLU Pro
	- Overall: Combined average across all tasks