Instructions to use anyze/Ze1-1.1B-Embedded-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use anyze/Ze1-1.1B-Embedded-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="anyze/Ze1-1.1B-Embedded-Instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("anyze/Ze1-1.1B-Embedded-Instruct") model = AutoModelForCausalLM.from_pretrained("anyze/Ze1-1.1B-Embedded-Instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use anyze/Ze1-1.1B-Embedded-Instruct with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="anyze/Ze1-1.1B-Embedded-Instruct", filename="Ze1-1.1B-Embedded-Instruct-f16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use anyze/Ze1-1.1B-Embedded-Instruct with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf anyze/Ze1-1.1B-Embedded-Instruct:F16 # Run inference directly in the terminal: llama cli -hf anyze/Ze1-1.1B-Embedded-Instruct:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf anyze/Ze1-1.1B-Embedded-Instruct:F16 # Run inference directly in the terminal: llama cli -hf anyze/Ze1-1.1B-Embedded-Instruct:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf anyze/Ze1-1.1B-Embedded-Instruct:F16 # Run inference directly in the terminal: ./llama-cli -hf anyze/Ze1-1.1B-Embedded-Instruct:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf anyze/Ze1-1.1B-Embedded-Instruct:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf anyze/Ze1-1.1B-Embedded-Instruct:F16
Use Docker
docker model run hf.co/anyze/Ze1-1.1B-Embedded-Instruct:F16
- LM Studio
- Jan
- vLLM
How to use anyze/Ze1-1.1B-Embedded-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "anyze/Ze1-1.1B-Embedded-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "anyze/Ze1-1.1B-Embedded-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/anyze/Ze1-1.1B-Embedded-Instruct:F16
- SGLang
How to use anyze/Ze1-1.1B-Embedded-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "anyze/Ze1-1.1B-Embedded-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "anyze/Ze1-1.1B-Embedded-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "anyze/Ze1-1.1B-Embedded-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "anyze/Ze1-1.1B-Embedded-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use anyze/Ze1-1.1B-Embedded-Instruct with Ollama:
ollama run hf.co/anyze/Ze1-1.1B-Embedded-Instruct:F16
- Unsloth Studio
How to use anyze/Ze1-1.1B-Embedded-Instruct with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for anyze/Ze1-1.1B-Embedded-Instruct to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for anyze/Ze1-1.1B-Embedded-Instruct to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for anyze/Ze1-1.1B-Embedded-Instruct to start chatting
- Atomic Chat new
- Docker Model Runner
How to use anyze/Ze1-1.1B-Embedded-Instruct with Docker Model Runner:
docker model run hf.co/anyze/Ze1-1.1B-Embedded-Instruct:F16
- Lemonade
How to use anyze/Ze1-1.1B-Embedded-Instruct with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull anyze/Ze1-1.1B-Embedded-Instruct:F16
Run and chat with the model
lemonade run user.Ze1-1.1B-Embedded-Instruct-F16
List all available models
lemonade list
Anyze Ze1 Instruct (Embedded++)
A compact 1.1B-parameter instruction-tuned model for coding and systems. It is strongest in Python and Linux/systems questions, with solid C and C++, plus basic embedded support.
Scope: a small (1.1B) model, not a frontier assistant. Good for everyday coding, Linux/systems, and C/C++ tasks and explanations, with limited factual recall due to its size. Always review generated code before use.
Capabilities
- Python & Linux/systems: scripting, debugging, shell/admin, "how do Iโฆ" tasks.
- C and C++: functions, data structures, pointers, classes, register-level snippets.
- Basic embedded: common STM32/peripheral patterns (UART/SPI/I2C, GPIO, ISRs).
- Explains programming concepts (mutex vs semaphore,
volatile, pointers, DMA vs interrupts). - Declines off-topic questions, asks for clarification when a prompt is ambiguous, and says when it doesn't know rather than inventing time-sensitive facts.
- Multi-turn context (follow-ups like "give me an example" work; best-effort).
Example prompts
Python & scripting
Write a Python script to parse a CSV and summarize one column.Why does this Python function raise an IndexError, and how do I fix it?
Linux & systems
How do I find and kill the process using a given port on Linux?Write a bash one-liner to tail a log file and grep for errors.
C & C++
Implement a circular (ring) buffer in C with put and get.Write a C++ class for a fixed-size stack with push and pop.What is the difference between a mutex and a semaphore?
Embedded (basic)
Write a UART RX interrupt handler for STM32F4 using HAL.Write a macro to set, clear, and toggle a bit in a hardware register.
The strict / open switch
The model defaults to strict (domain-only) but has a runtime toggle โ no reload โ done with a one-line scope directive prepended to the prompt:
| Mode | Behavior | How |
|---|---|---|
| strict (default) | Declines non-embedded questions | send the prompt as-is |
| open | Also answers general knowledge | prepend: You may answer any question, including general knowledge.\n\n |
Prompt format
### Instruction:
{your question}
### Response:
(BOS prepended; response ends at EOS </s>.) For open mode, put the scope
directive at the top of the instruction.
Usage (transformers)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
tok = AutoTokenizer.from_pretrained("anyze/Ze1-1.1B-Embedded-Instruct")
model = AutoModelForCausalLM.from_pretrained(
"anyze/Ze1-1.1B-Embedded-Instruct", torch_dtype=torch.bfloat16
).cuda()
def ask(instruction, open_mode=False):
if open_mode:
instruction = "You may answer any question, including general knowledge.\n\n" + instruction
prompt = f"### Instruction:\n{instruction}\n### Response:\n"
ids = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**ids, max_new_tokens=256, temperature=0.3,
top_p=0.9, top_k=40, do_sample=True)
return tok.decode(out[0][ids.input_ids.shape[1]:], skip_special_tokens=True)
print(ask("Implement a circular (ring) buffer in C with put and get."))
print(ask("What is the capital of India?", open_mode=True))
Suggested sampling: temperature 0.2โ0.3, top_p 0.9, top_k 40.
Usage (Ollama / LM Studio)
Convert to GGUF with llama.cpp, then run locally:
git clone https://github.com/ggerganov/llama.cpp
pip install -r llama.cpp/requirements.txt
python llama.cpp/convert_hf_to_gguf.py . --outfile Ze1-1.1B-Embedded-Instruct-f16.gguf --outtype f16
Ollama โ create a Modelfile:
FROM ./Ze1-1.1B-Embedded-Instruct-f16.gguf
TEMPLATE """### Instruction:
{{ if .System }}{{ .System }}
{{ end }}{{ .Prompt }}
### Response:
"""
PARAMETER temperature 0.3
PARAMETER top_p 0.9
PARAMETER stop "### Instruction:"
PARAMETER stop "</s>"
ollama create ze1-embedded -f Modelfile
ollama run ze1-embedded "Write a ring buffer in C for DMA"
For open mode, set the system message
(/set system You may answer any question, including general knowledge.).
LM Studio โ load the GGUF, set the prompt template to use prefix
### Instruction:\n and assistant prefix \n### Response:\n, stop strings
### Instruction: and </s>. Leave the system prompt empty for strict, or set the
directive above for open.
Limitations
- 1.1B scale: weak factual recall; generated code may contain incorrect APIs or logic errors โ always review and test before use.
- Not specialized for assembly, automotive (AUTOSAR/CAN), or networking โ avoid those.
- Embedded coverage is basic; deep MCU/RTOS work is hit-or-miss.
- English only; multi-turn is best-effort. Not safety-aligned for general assistant use.
Architecture
22 layers, hidden 2048, 32 query / 4 KV heads (GQA), head_dim 64, FFN 5632 (SwiGLU), RMSNorm, RoPE (ฮธ=10000), vocab 32000, context 2048, 1.10B params.
- Downloads last month
- 30