Instructions to use zenlm/zen-scribe with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use zenlm/zen-scribe with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="zenlm/zen-scribe")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("zenlm/zen-scribe")
model = AutoModelForMultimodalLM.from_pretrained("zenlm/zen-scribe", device_map="auto")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use zenlm/zen-scribe with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "zenlm/zen-scribe"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zenlm/zen-scribe",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/zenlm/zen-scribe

SGLang

How to use zenlm/zen-scribe with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "zenlm/zen-scribe" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zenlm/zen-scribe",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "zenlm/zen-scribe" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zenlm/zen-scribe",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use zenlm/zen-scribe with Docker Model Runner:
```
docker model run hf.co/zenlm/zen-scribe
```

Zen Scribe 4B

Professional content writing model fine-tuned for long-form generation, structured documents, and editorial quality output.

Zen Scribe is a 4B parameter language model from Zen LM optimized for writing tasks: blog posts, technical documentation, reports, creative writing, and structured content pipelines. It produces coherent, well-structured prose across extended contexts with consistent voice and style.

Model Specs

Property	Value
Parameters	4B
Architecture	Transformer (decoder-only)
Context Window	32,768 tokens
Output Format	Text
License	Apache 2.0
HuggingFace	zenlm/zen-scribe

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "zenlm/zen-scribe",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("zenlm/zen-scribe")

prompt = """Write a technical blog post introduction about vector databases:

"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.1,
)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Use Cases

Technical documentation: API references, guides, READMEs
Blog and editorial: Long-form articles, opinion pieces, explainers
Business writing: Reports, proposals, executive summaries
Creative writing: Fiction, screenplays, narrative content
Structured output: Templated content, form letters, product descriptions

Content Pipeline Integration

Zen Scribe integrates with Hanzo Flow for automated content pipelines:

# Content pipeline: Brief → Draft → Edit → Publish
import hanzo

client = hanzo.Client()

draft = client.completions.create(
    model="zen-scribe",
    messages=[
        {"role": "system", "content": "You are a technical writer. Write clearly and concisely."},
        {"role": "user", "content": "Write a 500-word introduction to Kubernetes networking."}
    ],
    max_tokens=600,
)
print(draft.choices[0].message.content)

MLX (Apple Silicon)

pip install mlx-lm
mlx_lm.generate --model zenlm/zen-scribe --prompt "Write an introduction to:" --max-tokens 500

llama.cpp (CPU/GGUF)

llama-cli -m zen-scribe-q4_k_m.gguf -p "Write a technical blog post about:" -n 512

Deployment

Zen Scribe is available via the Hanzo API:

curl https://api.hanzo.ai/v1/chat/completions \
  -H "Authorization: Bearer $HANZO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zen-scribe",
    "messages": [{"role": "user", "content": "Write a product announcement for a new AI API:"}],
    "max_tokens": 400
  }'

License

Apache 2.0. See LICENSE for details.

Citation

@misc{zenlm2025zenscribe,
    title={Zen Scribe: Professional Content Writing Model},
    author={Hanzo AI and Zoo Labs Foundation},
    year={2025},
    publisher={HuggingFace},
    howpublished={\url{https://huggingface.co/zenlm/zen-scribe}}
}

Part of the Zen model family by Hanzo AI (Techstars '17) and Zoo Labs Foundation.

Abliteration

Zen Scribe 4B weights are derived from an abliterated base model. Abliteration removes refusal behavior by identifying and nullifying the "refusal direction" in the model's residual stream, as described in Maxime Labonne's methodology.

Method: Directional ablation on the residual stream across all layers Source: Zen MoDE (Mixture of Distilled Experts) architecture base Effect: Removes refusal behaviors while preserving all other capabilities Identity layer: Zen identity added via system prompt — full LoRA fine-tuning planned

Abliteration is a feature, not a limitation. It enables unrestricted research, security testing, and applications where safety guardrails are managed at the application layer rather than baked into model weights.

Downloads last month: 80

Safetensors

Model size

2B params

Tensor type

BF16

Collection including zenlm/zen-scribe

Zen Specialty

Collection

Vertical-specific finetunes — finance, medical, legal, sql, translate, scribe, designer, etc. • 15 items • Updated 16 days ago