Instructions to use anyze/Ze1-1.1B-Embedded-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use anyze/Ze1-1.1B-Embedded-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="anyze/Ze1-1.1B-Embedded-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("anyze/Ze1-1.1B-Embedded-Instruct")
model = AutoModelForCausalLM.from_pretrained("anyze/Ze1-1.1B-Embedded-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use anyze/Ze1-1.1B-Embedded-Instruct with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="anyze/Ze1-1.1B-Embedded-Instruct",
	filename="Ze1-1.1B-Embedded-Instruct-f16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use anyze/Ze1-1.1B-Embedded-Instruct with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf anyze/Ze1-1.1B-Embedded-Instruct:F16
# Run inference directly in the terminal:
llama cli -hf anyze/Ze1-1.1B-Embedded-Instruct:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf anyze/Ze1-1.1B-Embedded-Instruct:F16
# Run inference directly in the terminal:
llama cli -hf anyze/Ze1-1.1B-Embedded-Instruct:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf anyze/Ze1-1.1B-Embedded-Instruct:F16
# Run inference directly in the terminal:
./llama-cli -hf anyze/Ze1-1.1B-Embedded-Instruct:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf anyze/Ze1-1.1B-Embedded-Instruct:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf anyze/Ze1-1.1B-Embedded-Instruct:F16

Use Docker

docker model run hf.co/anyze/Ze1-1.1B-Embedded-Instruct:F16

LM Studio
Jan

vLLM

How to use anyze/Ze1-1.1B-Embedded-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "anyze/Ze1-1.1B-Embedded-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "anyze/Ze1-1.1B-Embedded-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/anyze/Ze1-1.1B-Embedded-Instruct:F16

SGLang

How to use anyze/Ze1-1.1B-Embedded-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "anyze/Ze1-1.1B-Embedded-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "anyze/Ze1-1.1B-Embedded-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "anyze/Ze1-1.1B-Embedded-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "anyze/Ze1-1.1B-Embedded-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use anyze/Ze1-1.1B-Embedded-Instruct with Ollama:
```
ollama run hf.co/anyze/Ze1-1.1B-Embedded-Instruct:F16
```

Unsloth Studio

How to use anyze/Ze1-1.1B-Embedded-Instruct with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for anyze/Ze1-1.1B-Embedded-Instruct to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for anyze/Ze1-1.1B-Embedded-Instruct to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for anyze/Ze1-1.1B-Embedded-Instruct to start chatting

Atomic Chat new
Docker Model Runner
How to use anyze/Ze1-1.1B-Embedded-Instruct with Docker Model Runner:
```
docker model run hf.co/anyze/Ze1-1.1B-Embedded-Instruct:F16
```

Lemonade

How to use anyze/Ze1-1.1B-Embedded-Instruct with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull anyze/Ze1-1.1B-Embedded-Instruct:F16

Run and chat with the model

lemonade run user.Ze1-1.1B-Embedded-Instruct-F16

List all available models

lemonade list

Anyze Ze1 Instruct (Embedded++)

A compact 1.1B-parameter instruction-tuned model for coding and systems. It is strongest in Python and Linux/systems questions, with solid C and C++, plus basic embedded support.

Scope: a small (1.1B) model, not a frontier assistant. Good for everyday coding, Linux/systems, and C/C++ tasks and explanations, with limited factual recall due to its size. Always review generated code before use.

Capabilities

Python & Linux/systems: scripting, debugging, shell/admin, "how do I…" tasks.
C and C++: functions, data structures, pointers, classes, register-level snippets.
Basic embedded: common STM32/peripheral patterns (UART/SPI/I2C, GPIO, ISRs).
Explains programming concepts (mutex vs semaphore, volatile, pointers, DMA vs interrupts).
Declines off-topic questions, asks for clarification when a prompt is ambiguous, and says when it doesn't know rather than inventing time-sensitive facts.
Multi-turn context (follow-ups like "give me an example" work; best-effort).

Example prompts

Python & scripting

Write a Python script to parse a CSV and summarize one column.
Why does this Python function raise an IndexError, and how do I fix it?

Linux & systems

How do I find and kill the process using a given port on Linux?
Write a bash one-liner to tail a log file and grep for errors.

C & C++

Implement a circular (ring) buffer in C with put and get.
Write a C++ class for a fixed-size stack with push and pop.
What is the difference between a mutex and a semaphore?

Embedded (basic)

Write a UART RX interrupt handler for STM32F4 using HAL.
Write a macro to set, clear, and toggle a bit in a hardware register.

The strict / open switch

The model defaults to strict (domain-only) but has a runtime toggle — no reload — done with a one-line scope directive prepended to the prompt:

Mode	Behavior	How
strict (default)	Declines non-embedded questions	send the prompt as-is
open	Also answers general knowledge	prepend: `You may answer any question, including general knowledge.\n\n`

Prompt format

### Instruction:
{your question}
### Response:

(BOS prepended; response ends at EOS </s>.) For open mode, put the scope directive at the top of the instruction.

Usage (transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tok = AutoTokenizer.from_pretrained("anyze/Ze1-1.1B-Embedded-Instruct")
model = AutoModelForCausalLM.from_pretrained(
    "anyze/Ze1-1.1B-Embedded-Instruct", torch_dtype=torch.bfloat16
).cuda()

def ask(instruction, open_mode=False):
    if open_mode:
        instruction = "You may answer any question, including general knowledge.\n\n" + instruction
    prompt = f"### Instruction:\n{instruction}\n### Response:\n"
    ids = tok(prompt, return_tensors="pt").to(model.device)
    out = model.generate(**ids, max_new_tokens=256, temperature=0.3,
                         top_p=0.9, top_k=40, do_sample=True)
    return tok.decode(out[0][ids.input_ids.shape[1]:], skip_special_tokens=True)

print(ask("Implement a circular (ring) buffer in C with put and get."))
print(ask("What is the capital of India?", open_mode=True))

Suggested sampling: temperature 0.2–0.3, top_p 0.9, top_k 40.

Usage (Ollama / LM Studio)

Convert to GGUF with llama.cpp, then run locally:

git clone https://github.com/ggerganov/llama.cpp
pip install -r llama.cpp/requirements.txt
python llama.cpp/convert_hf_to_gguf.py . --outfile Ze1-1.1B-Embedded-Instruct-f16.gguf --outtype f16

Ollama — create a Modelfile:

FROM ./Ze1-1.1B-Embedded-Instruct-f16.gguf
TEMPLATE """### Instruction:
{{ if .System }}{{ .System }}

{{ end }}{{ .Prompt }}
### Response:
"""
PARAMETER temperature 0.3
PARAMETER top_p 0.9
PARAMETER stop "### Instruction:"
PARAMETER stop "</s>"

ollama create ze1-embedded -f Modelfile
ollama run ze1-embedded "Write a ring buffer in C for DMA"

For open mode, set the system message (/set system You may answer any question, including general knowledge.).

LM Studio — load the GGUF, set the prompt template to use prefix ### Instruction:\n and assistant prefix \n### Response:\n, stop strings ### Instruction: and </s>. Leave the system prompt empty for strict, or set the directive above for open.

Limitations

1.1B scale: weak factual recall; generated code may contain incorrect APIs or logic errors — always review and test before use.
Not specialized for assembly, automotive (AUTOSAR/CAN), or networking — avoid those.
Embedded coverage is basic; deep MCU/RTOS work is hit-or-miss.
English only; multi-turn is best-effort. Not safety-aligned for general assistant use.

Architecture

22 layers, hidden 2048, 32 query / 4 KV heads (GQA), head_dim 64, FFN 5632 (SwiGLU), RMSNorm, RoPE (θ=10000), vocab 32000, context 2048, 1.10B params.

Downloads last month: 30

Safetensors

Model size

1B params

Tensor type

BF16