Instructions to use Tralalabs/CHEETAH-350M-Merged-FP16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Tralalabs/CHEETAH-350M-Merged-FP16 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Tralalabs/CHEETAH-350M-Merged-FP16")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Tralalabs/CHEETAH-350M-Merged-FP16")
model = AutoModelForCausalLM.from_pretrained("Tralalabs/CHEETAH-350M-Merged-FP16")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Tralalabs/CHEETAH-350M-Merged-FP16 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Tralalabs/CHEETAH-350M-Merged-FP16"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Tralalabs/CHEETAH-350M-Merged-FP16",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Tralalabs/CHEETAH-350M-Merged-FP16

SGLang

How to use Tralalabs/CHEETAH-350M-Merged-FP16 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Tralalabs/CHEETAH-350M-Merged-FP16" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Tralalabs/CHEETAH-350M-Merged-FP16",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Tralalabs/CHEETAH-350M-Merged-FP16" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Tralalabs/CHEETAH-350M-Merged-FP16",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Tralalabs/CHEETAH-350M-Merged-FP16 with Docker Model Runner:
```
docker model run hf.co/Tralalabs/CHEETAH-350M-Merged-FP16
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

CHEETAH-350M-Merged-FP16

CHEETAH-350M-Merged-FP16 is a merged instruction-tuned model based on LiquidAI/LFM2-350M.

It was fine-tuned as a LoRA adapter on HuggingFaceTB/smol-smoltalk, then merged into the base model to create a standalone Transformers model.

🐆 Fast, small, cheap, and instruction-following focused.

Model Details

Field	Value
Model family	CHEETAH
Model name	CHEETAH-350M-Merged-FP16
Base model	`LiquidAI/LFM2-350M`
Training dataset	`HuggingFaceTB/smol-smoltalk`
Fine-tuning type	LoRA SFT
Final format	Merged FP16 Transformers model
Training platform	Modal
GPU	NVIDIA L4
Selected checkpoint	Step 750
License	`lfm1.0`

Training Summary

The model was trained as a LoRA adapter and stopped at step 750 after the checkpoint was saved.

Metric	Value
Selected step	750
Last evaluated step	700
Eval loss	1.3082
Eval perplexity	3.70
Tokens seen at checkpoint	9,711,906
Training time	32.8 minutes
Speed near end	~5,000 tok/s
GPU	NVIDIA L4

Final Training Log

[2026-05-30 19:03:52] step=700/1000 loss=17.2614 lr=4.36e-05 tokens_seen=9,070,297 tok/s=5116.6 elapsed_min=30.6
[2026-05-30 19:03:53] eval_loss=1.3082 eval_ppl=3.70
[2026-05-30 19:04:18] step=710/1000 loss=16.9876 lr=4.10e-05 tokens_seen=9,195,110 tok/s=4728.6 elapsed_min=31.1
[2026-05-30 19:04:44] step=720/1000 loss=16.4713 lr=3.84e-05 tokens_seen=9,324,489 tok/s=5017.9 elapsed_min=31.5
[2026-05-30 19:05:10] step=730/1000 loss=16.7246 lr=3.59e-05 tokens_seen=9,457,294 tok/s=5178.9 elapsed_min=31.9
[2026-05-30 19:05:36] step=740/1000 loss=16.3293 lr=3.34e-05 tokens_seen=9,580,098 tok/s=4697.8 elapsed_min=32.4
[2026-05-30 19:06:02] step=750/1000 loss=16.4356 lr=3.10e-05 tokens_seen=9,711,906 tok/s=5018.9 elapsed_min=32.8
[2026-05-30 19:06:06] Saved checkpoint: /outputs/CHEETAH-350M-LoRA-L4/checkpoints/step-750

Note: the displayed training loss was affected by gradient accumulation logging. Evaluation loss and perplexity are the preferred metrics for judging the selected checkpoint.

Intended Use

This model is intended for:

Lightweight instruction following
Small assistant experiments
Fast local or cloud inference
Educational fine-tuning experiments
CHEETAH model family development

Not Intended For

This model is not intended for:

High-stakes medical, legal, or financial advice
Safety-critical automation
Private-data processing without review
Production deployment without evaluation

Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Tralalabs/CHEETAH-350M-Merged-FP16"

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True,
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
)

messages = [
    {
        "role": "system",
        "content": "You are CHEETAH, a fast, clear, helpful assistant.",
    },
    {
        "role": "user",
        "content": "Explain why cheetahs are fast in 3 short bullets.",
    },
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=160,
        do_sample=True,
        temperature=0.35,
        top_p=0.9,
        repetition_penalty=1.05,
        pad_token_id=tokenizer.eos_token_id,
    )

print(tokenizer.decode(output[0], skip_special_tokens=True))

Training Data

The model was fine-tuned on:

HuggingFaceTB/smol-smoltalk

This dataset is a subset of SmolTalk adapted for models smaller than 1B parameters.

Training Configuration

Setting	Value
Base model	`LiquidAI/LFM2-350M`
Dataset	`HuggingFaceTB/smol-smoltalk`
Rows	16,000
Max sequence length	2048
LoRA rank	16
LoRA alpha	32
LoRA dropout	0.05
Learning rate	2e-4
Gradient accumulation	16
Selected checkpoint	Step 750
Final tokens seen	9,711,906

Limitations

CHEETAH-350M-Merged-FP16 is a small 350M-class model. It may:

Hallucinate facts
Struggle with long reasoning chains
Give weak answers on niche knowledge
Misread complex instructions
Need careful prompting for best results

For factual or current information, verify outputs with trusted sources.

License

This model is released under lfm1.0, matching the license of the base model LiquidAI/LFM2-350M.

The training dataset HuggingFaceTB/smol-smoltalk is licensed under Apache-2.0.

Citation

Base model:

LiquidAI/LFM2-350M

Dataset:

HuggingFaceTB/smol-smoltalk

Model Family

This model belongs to the CHEETAH family:

CHEETAH-[SIZE]-LoRA
CHEETAH-[SIZE]-Merged

This release:

CHEETAH-350M-Merged-FP16

Example Output

Prompt:

system
You are CHEETAH, a fast, clear, helpful assistant.

user
Explain why cheetahs are fast in 3 short bullets.

Model output:

assistant
1. Cheetahs have a unique body structure that allows them to run at incredible speeds.
2. Their long legs and lightweight build enable them to accelerate quickly.
3. Cheetahs have a specialized tail that acts as a counterbalance during high-speed runs.