Instructions to use EphAsad/Atem-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use EphAsad/Atem-3B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="EphAsad/Atem-3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("EphAsad/Atem-3B")
model = AutoModelForMultimodalLM.from_pretrained("EphAsad/Atem-3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use EphAsad/Atem-3B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="EphAsad/Atem-3B",
	filename="Atem-3b.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use EphAsad/Atem-3B with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf EphAsad/Atem-3B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf EphAsad/Atem-3B:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf EphAsad/Atem-3B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf EphAsad/Atem-3B:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf EphAsad/Atem-3B:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf EphAsad/Atem-3B:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf EphAsad/Atem-3B:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf EphAsad/Atem-3B:Q4_K_M

Use Docker

docker model run hf.co/EphAsad/Atem-3B:Q4_K_M

LM Studio
Jan

vLLM

How to use EphAsad/Atem-3B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "EphAsad/Atem-3B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EphAsad/Atem-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/EphAsad/Atem-3B:Q4_K_M

SGLang

How to use EphAsad/Atem-3B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "EphAsad/Atem-3B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EphAsad/Atem-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "EphAsad/Atem-3B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EphAsad/Atem-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use EphAsad/Atem-3B with Ollama:
```
ollama run hf.co/EphAsad/Atem-3B:Q4_K_M
```

Unsloth Studio

How to use EphAsad/Atem-3B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EphAsad/Atem-3B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EphAsad/Atem-3B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for EphAsad/Atem-3B to start chatting

How to use EphAsad/Atem-3B with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf EphAsad/Atem-3B:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "EphAsad/Atem-3B:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use EphAsad/Atem-3B with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf EphAsad/Atem-3B:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default EphAsad/Atem-3B:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use EphAsad/Atem-3B with Docker Model Runner:
```
docker model run hf.co/EphAsad/Atem-3B:Q4_K_M
```

Lemonade

How to use EphAsad/Atem-3B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull EphAsad/Atem-3B:Q4_K_M

Run and chat with the model

lemonade run user.Atem-3B-Q4_K_M

List all available models

lemonade list

Atem-3B

Ancient logic. Modern intelligence.

The 3B foundation model of the Atem series — direct reasoning at scale.

Overview

Atem-3B is the first release in the 3B branch of the Atem model series — a Stage 1 supervised fine-tune on Qwen2.5-3B-Instruct across approximately 120,000 training examples spanning mathematics, code, reasoning, and general instruction following.

Where the 1.5B Atem line demonstrated that a small model could be meaningfully improved through careful data curation, Atem-3B applies the same methodology at twice the parameter count. The 3B base provides a stronger foundation — particularly for mathematical reasoning and structured generation — while the training corpus prioritises quality and diversity over volume.

Design philosophy: Think tags were stripped from all training data during preprocessing. Atem-3B is a direct-answer model — it does not produce <think> traces. The reasoning capacity of the 3B base is channelled into producing well-structured, considered responses rather than visible chain-of-thought. A CoT variant is planned for Stage 2.

The Atem Series

1.5B Series

Model	Stage	Capability
Atem v1	Stage 1 — SFT	Fast, direct reasoning
Atem-Wisdom	Stage 2 — CoT	Explicit thinking traces
Atem-Pharaoh (planned)	Stage 3 — DPO/IPO	Preference-aligned reasoning

3B Series

Model	Stage	Capability
Atem-3B	Stage 1 — SFT	Direct reasoning at 3B scale
Atem-3B-Pharaoh	Stage 2 — CoT	Explicit thinking traces

Model Details

Property	Value
Base model	Qwen/Qwen2.5-3B-Instruct
Training method	LoRA SFT — Stage 1 (think tags stripped)
LoRA config	r=32, alpha=64, dropout=0.05
Parameters	~3.09B
Trainable parameters	59,867,136 (1.90%)
Training records	120,043 (after token length filtering)
Epochs	1
Final val loss	0.8384
Hardware	NVIDIA A100-SXM4-80GB
Max sequence length	4,096 tokens
Precision	bfloat16
License	Apache 2.0

Output Format

Atem-3B produces direct, structured responses. Think tags were stripped from all training data during preprocessing — the model was trained exclusively on clean outputs with no chain-of-thought traces.

[Direct response — reasoned, structured, no <think> tags]

This is a deliberate Stage 1 design choice. A chain-of-thought variant exposing explicit reasoning traces is planned as Stage 2.

Training Data

Stage 1 training used approximately 120,000 examples drawn from eleven sources. All reasoning traces (<think>...</think> blocks) were stripped prior to training. Records shorter than 20 characters after stripping were excluded.

Dataset	Count	Focus
Modotte/CodeX-2M-Thinking	40,000	Code (think tags stripped)
Jackrong/Kimi-K2.5-Reasoning-1M-Cleaned	23,000	General reasoning (English filtered)
open-r1/OpenThoughts-114k-math	10,000	Mathematics (correct only)
flytech/python-codes-25k	10,000	Python code
FreedomIntelligence/medical-o1-reasoning-SFT	10,000	Medical reasoning
tuanha1305/DeepSeek-R1-Distill	9,000	Reasoning distillation
EphAsad/QWENMillenium-SF	5,000	General instruction
EphAsad/MistralMillenium-SF	5,000	General instruction
WithinUsAI/MiniMax_M2.7_Distilled_5k	5,000	Mixed reasoning
Jackrong/Claude-opus-4.7-TraceInversion-5000x	4,761	Inverted reasoning
EphAsad/Phi4Millennium-SF	2,932	General instruction

Chinese-language records from Kimi K2.5 were filtered using an ASCII character ratio threshold before inclusion. OpenThoughts-114k-math was filtered to correct == True examples only.

Loss curve:

Step	Train Loss	Val Loss
200	0.9236	0.9011
400	0.9200	0.8796
600	0.8591	0.8685
800	0.8837	0.8585
1000	0.8455	0.8507
1200	0.8359	0.8453
1400	0.8240	0.8413
1600	0.8626	0.8391
1800	0.8940	0.8384
1876 (final)	0.8702	0.8384

Validation loss descends steadily throughout the full run with no overfitting signal.

Evaluation

Benchmark Results

Evaluated using lm-evaluation-harness via the Python API under identical conditions for both models. ARC-Challenge and HellaSwag use zero-shot normalised accuracy; GSM8K uses 5-shot. Both models evaluated at 4-bit quantisation on the same A100-SXM4-80GB in torch.float16.

Task	Base (3B)	Atem-3B	Delta
ARC-Challenge	48.1%	48.0%	-0.1% —
GSM8K (strict-match)	2.1%	37.1%	+35.0%
GSM8K (flexible-extract)	62.4%	64.7%	+2.3% ✓
HellaSwag	73.5%	70.4%	-3.0% ⚠

Note on GSM8K: lm_eval's strict-match filter uses a #### number regex that only fires when the model produces that exact token sequence. The base Qwen2.5-3B-Instruct solves problems correctly but formats answers conversationally, yielding 2.1% strict-match against a 62.4% flexible-extract — the latter being the accurate measure of base model mathematical capability. Atem-3B's training on math distillation datasets reinforced structured answer termination, producing 37.1% strict-match. The meaningful comparison is flexible-extract: 62.4% → 64.7% (+2.3%) — a genuine but modest improvement. The strict-match delta is a formatting artefact, not a 35-point gain in mathematical reasoning ability.

Note on HellaSwag: The -3.0% regression is a common pattern when fine-tuning instruct models on structured reasoning and task-completion data. HellaSwag tests commonsense sentence completion in a multiple-choice format; training on problem-solving corpora shifts the model's distribution away from the casual, predictive register that HellaSwag measures. This is a known trade-off, not an indicator of general capability loss.

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "EphAsad/Atem-3B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {
        "role": "user",
        "content": "Explain the difference between a process and a thread."
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

with torch.no_grad():
    output = model.generate(
        input_ids=inputs,
        max_new_tokens=1024,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.1,
        do_sample=True,
    )

response = tokenizer.decode(
    output[0][inputs.shape[1]:],
    skip_special_tokens=True
)
print(response)

Unsloth (faster inference)

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="EphAsad/Atem-3B",
    max_seq_length=4096,
    dtype=torch.bfloat16,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

messages = [
    {
        "role": "user",
        "content": "Write a Python function to find all prime numbers up to n."
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")

with torch.no_grad():
    output = model.generate(
        input_ids=inputs,
        max_new_tokens=1024,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
    )

print(tokenizer.decode(output[0][inputs.shape[1]:], skip_special_tokens=True))

Ollama

# Recommended — best speed/quality balance
ollama run hf.co/EphAsad/Atem-3B:Q4_K_M

# Higher quality
ollama run hf.co/EphAsad/Atem-3B:Q5_K_M

# Near-lossless
ollama run hf.co/EphAsad/Atem-3B:Q8_0

llama.cpp

llama-server -hf EphAsad/Atem-3B:Q4_K_M

Available Files

File	Size	Description
`model-00001-of-00002.safetensors` + `model-00002-of-00002.safetensors`	~6.2 GB	Full bfloat16 weights
`Atem-3b.Q4_K_M.gguf`	~1.93 GB	4-bit — recommended
`Atem-3b.Q5_K_M.gguf`	~2.22 GB	5-bit
`Atem-3b.Q8_0.gguf`	~3.29 GB	8-bit — near-lossless

System Prompt

Atem-3B's identity is baked into the chat template and activates without an explicit system message. To override manually:

You are Atem, a precise and analytical reasoning assistant. You approach
every problem methodically — identifying core concepts, reasoning step by
step, and arriving at well-supported conclusions. You show your thinking
clearly and are thorough, direct, and intellectually honest.

Roadmap

Stage	Status	Description
Stage 1 — SFT	✅ Complete	Atem-3B — this model
Stage 2 — CoT SFT	🔄 Planned	Atem-3B-Wisdom — chain-of-thought traces
Stage 3 — DPO/IPO	🔄 Planned	Atem-3B-Pharaoh — preference-aligned reasoning

Citation

@misc{atem_3b_2026,
  author       = {Asad, Zain},
  title        = {Atem-3B: A 3B Direct-Reasoning Model via Stage 1 SFT},
  year         = {2026},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/EphAsad/Atem-3B}},
}