Instructions to use EphAsad/Atem-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use EphAsad/Atem-3B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="EphAsad/Atem-3B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("EphAsad/Atem-3B") model = AutoModelForMultimodalLM.from_pretrained("EphAsad/Atem-3B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use EphAsad/Atem-3B with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="EphAsad/Atem-3B", filename="Atem-3b.Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use EphAsad/Atem-3B with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf EphAsad/Atem-3B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf EphAsad/Atem-3B:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf EphAsad/Atem-3B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf EphAsad/Atem-3B:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf EphAsad/Atem-3B:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf EphAsad/Atem-3B:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf EphAsad/Atem-3B:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf EphAsad/Atem-3B:Q4_K_M
Use Docker
docker model run hf.co/EphAsad/Atem-3B:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use EphAsad/Atem-3B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "EphAsad/Atem-3B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EphAsad/Atem-3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/EphAsad/Atem-3B:Q4_K_M
- SGLang
How to use EphAsad/Atem-3B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "EphAsad/Atem-3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EphAsad/Atem-3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "EphAsad/Atem-3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EphAsad/Atem-3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use EphAsad/Atem-3B with Ollama:
ollama run hf.co/EphAsad/Atem-3B:Q4_K_M
- Unsloth Studio
How to use EphAsad/Atem-3B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for EphAsad/Atem-3B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for EphAsad/Atem-3B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for EphAsad/Atem-3B to start chatting
- Pi
How to use EphAsad/Atem-3B with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf EphAsad/Atem-3B:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "EphAsad/Atem-3B:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use EphAsad/Atem-3B with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf EphAsad/Atem-3B:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default EphAsad/Atem-3B:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use EphAsad/Atem-3B with Docker Model Runner:
docker model run hf.co/EphAsad/Atem-3B:Q4_K_M
- Lemonade
How to use EphAsad/Atem-3B with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull EphAsad/Atem-3B:Q4_K_M
Run and chat with the model
lemonade run user.Atem-3B-Q4_K_M
List all available models
lemonade list
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
)Atem-3B
Ancient logic. Modern intelligence.
The 3B foundation model of the Atem series — direct reasoning at scale.
Overview
Atem-3B is the first release in the 3B branch of the Atem model series — a Stage 1 supervised fine-tune on Qwen2.5-3B-Instruct across approximately 120,000 training examples spanning mathematics, code, reasoning, and general instruction following.
Where the 1.5B Atem line demonstrated that a small model could be meaningfully improved through careful data curation, Atem-3B applies the same methodology at twice the parameter count. The 3B base provides a stronger foundation — particularly for mathematical reasoning and structured generation — while the training corpus prioritises quality and diversity over volume.
Design philosophy: Think tags were stripped from all training data during preprocessing. Atem-3B is a direct-answer model — it does not produce <think> traces. The reasoning capacity of the 3B base is channelled into producing well-structured, considered responses rather than visible chain-of-thought. A CoT variant is planned for Stage 2.
The Atem Series
1.5B Series
| Model | Stage | Capability |
|---|---|---|
| Atem v1 | Stage 1 — SFT | Fast, direct reasoning |
| Atem-Wisdom | Stage 2 — CoT | Explicit thinking traces |
| Atem-Pharaoh (planned) | Stage 3 — DPO/IPO | Preference-aligned reasoning |
3B Series
| Model | Stage | Capability |
|---|---|---|
| Atem-3B | Stage 1 — SFT | Direct reasoning at 3B scale |
| Atem-3B-Pharaoh | Stage 2 — CoT | Explicit thinking traces |
Model Details
| Property | Value |
|---|---|
| Base model | Qwen/Qwen2.5-3B-Instruct |
| Training method | LoRA SFT — Stage 1 (think tags stripped) |
| LoRA config | r=32, alpha=64, dropout=0.05 |
| Parameters | ~3.09B |
| Trainable parameters | 59,867,136 (1.90%) |
| Training records | 120,043 (after token length filtering) |
| Epochs | 1 |
| Final val loss | 0.8384 |
| Hardware | NVIDIA A100-SXM4-80GB |
| Max sequence length | 4,096 tokens |
| Precision | bfloat16 |
| License | Apache 2.0 |
Output Format
Atem-3B produces direct, structured responses. Think tags were stripped from all training data during preprocessing — the model was trained exclusively on clean outputs with no chain-of-thought traces.
[Direct response — reasoned, structured, no <think> tags]
This is a deliberate Stage 1 design choice. A chain-of-thought variant exposing explicit reasoning traces is planned as Stage 2.
Training Data
Stage 1 training used approximately 120,000 examples drawn from eleven sources. All reasoning traces (<think>...</think> blocks) were stripped prior to training. Records shorter than 20 characters after stripping were excluded.
| Dataset | Count | Focus |
|---|---|---|
| Modotte/CodeX-2M-Thinking | 40,000 | Code (think tags stripped) |
| Jackrong/Kimi-K2.5-Reasoning-1M-Cleaned | 23,000 | General reasoning (English filtered) |
| open-r1/OpenThoughts-114k-math | 10,000 | Mathematics (correct only) |
| flytech/python-codes-25k | 10,000 | Python code |
| FreedomIntelligence/medical-o1-reasoning-SFT | 10,000 | Medical reasoning |
| tuanha1305/DeepSeek-R1-Distill | 9,000 | Reasoning distillation |
| EphAsad/QWENMillenium-SF | 5,000 | General instruction |
| EphAsad/MistralMillenium-SF | 5,000 | General instruction |
| WithinUsAI/MiniMax_M2.7_Distilled_5k | 5,000 | Mixed reasoning |
| Jackrong/Claude-opus-4.7-TraceInversion-5000x | 4,761 | Inverted reasoning |
| EphAsad/Phi4Millennium-SF | 2,932 | General instruction |
Chinese-language records from Kimi K2.5 were filtered using an ASCII character ratio threshold before inclusion. OpenThoughts-114k-math was filtered to correct == True examples only.
Loss curve:
| Step | Train Loss | Val Loss |
|---|---|---|
| 200 | 0.9236 | 0.9011 |
| 400 | 0.9200 | 0.8796 |
| 600 | 0.8591 | 0.8685 |
| 800 | 0.8837 | 0.8585 |
| 1000 | 0.8455 | 0.8507 |
| 1200 | 0.8359 | 0.8453 |
| 1400 | 0.8240 | 0.8413 |
| 1600 | 0.8626 | 0.8391 |
| 1800 | 0.8940 | 0.8384 |
| 1876 (final) | 0.8702 | 0.8384 |
Validation loss descends steadily throughout the full run with no overfitting signal.
Evaluation
Benchmark Results
Evaluated using lm-evaluation-harness via the Python API under identical conditions for both models. ARC-Challenge and HellaSwag use zero-shot normalised accuracy; GSM8K uses 5-shot. Both models evaluated at 4-bit quantisation on the same A100-SXM4-80GB in torch.float16.
| Task | Base (3B) | Atem-3B | Delta |
|---|---|---|---|
| ARC-Challenge | 48.1% | 48.0% | -0.1% — |
| GSM8K (strict-match) | 2.1% | 37.1% | +35.0% |
| GSM8K (flexible-extract) | 62.4% | 64.7% | +2.3% ✓ |
| HellaSwag | 73.5% | 70.4% | -3.0% ⚠ |
Note on GSM8K: lm_eval's strict-match filter uses a #### number regex that only fires when the model produces that exact token sequence. The base Qwen2.5-3B-Instruct solves problems correctly but formats answers conversationally, yielding 2.1% strict-match against a 62.4% flexible-extract — the latter being the accurate measure of base model mathematical capability. Atem-3B's training on math distillation datasets reinforced structured answer termination, producing 37.1% strict-match. The meaningful comparison is flexible-extract: 62.4% → 64.7% (+2.3%) — a genuine but modest improvement. The strict-match delta is a formatting artefact, not a 35-point gain in mathematical reasoning ability.
Note on HellaSwag: The -3.0% regression is a common pattern when fine-tuning instruct models on structured reasoning and task-completion data. HellaSwag tests commonsense sentence completion in a multiple-choice format; training on problem-solving corpora shifts the model's distribution away from the casual, predictive register that HellaSwag measures. This is a known trade-off, not an indicator of general capability loss.
Usage
Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "EphAsad/Atem-3B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto"
)
messages = [
{
"role": "user",
"content": "Explain the difference between a process and a thread."
}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
with torch.no_grad():
output = model.generate(
input_ids=inputs,
max_new_tokens=1024,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.1,
do_sample=True,
)
response = tokenizer.decode(
output[0][inputs.shape[1]:],
skip_special_tokens=True
)
print(response)
Unsloth (faster inference)
from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="EphAsad/Atem-3B",
max_seq_length=4096,
dtype=torch.bfloat16,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
messages = [
{
"role": "user",
"content": "Write a Python function to find all prime numbers up to n."
}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to("cuda")
with torch.no_grad():
output = model.generate(
input_ids=inputs,
max_new_tokens=1024,
temperature=0.7,
top_p=0.9,
do_sample=True,
)
print(tokenizer.decode(output[0][inputs.shape[1]:], skip_special_tokens=True))
Ollama
# Recommended — best speed/quality balance
ollama run hf.co/EphAsad/Atem-3B:Q4_K_M
# Higher quality
ollama run hf.co/EphAsad/Atem-3B:Q5_K_M
# Near-lossless
ollama run hf.co/EphAsad/Atem-3B:Q8_0
llama.cpp
llama-server -hf EphAsad/Atem-3B:Q4_K_M
Available Files
| File | Size | Description |
|---|---|---|
model-00001-of-00002.safetensors + model-00002-of-00002.safetensors |
~6.2 GB | Full bfloat16 weights |
Atem-3b.Q4_K_M.gguf |
~1.93 GB | 4-bit — recommended |
Atem-3b.Q5_K_M.gguf |
~2.22 GB | 5-bit |
Atem-3b.Q8_0.gguf |
~3.29 GB | 8-bit — near-lossless |
System Prompt
Atem-3B's identity is baked into the chat template and activates without an explicit system message. To override manually:
You are Atem, a precise and analytical reasoning assistant. You approach
every problem methodically — identifying core concepts, reasoning step by
step, and arriving at well-supported conclusions. You show your thinking
clearly and are thorough, direct, and intellectually honest.
Roadmap
| Stage | Status | Description |
|---|---|---|
| Stage 1 — SFT | ✅ Complete | Atem-3B — this model |
| Stage 2 — CoT SFT | 🔄 Planned | Atem-3B-Wisdom — chain-of-thought traces |
| Stage 3 — DPO/IPO | 🔄 Planned | Atem-3B-Pharaoh — preference-aligned reasoning |
Citation
@misc{atem_3b_2026,
author = {Asad, Zain},
title = {Atem-3B: A 3B Direct-Reasoning Model via Stage 1 SFT},
year = {2026},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/EphAsad/Atem-3B}},
}
License
Released under the Apache 2.0 License, consistent with the base model (Qwen2.5-3B-Instruct).
Built independently by EphAsad
- Downloads last month
- 502
Model tree for EphAsad/Atem-3B
Datasets used to train EphAsad/Atem-3B
Modotte/CodeX-2M-Thinking
flytech/python-codes-25k
Evaluation results
- Accuracy (normalised) on ARC-Challengetest set self-reported0.480
- Exact Match (flexible-extract, 5-shot) on GSM8Ktest set self-reported0.647
- Accuracy (normalised) on HellaSwagvalidation set self-reported0.704

# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="EphAsad/Atem-3B", filename="", )