TinyClaude-1B
A lightweight, locally-runnable language model based on TinyLlama 1.1B, enhanced with a sophisticated system prompt inspired by Claude's behavioral guidelines.
Overview
TinyClaude-1B brings thoughtful AI assistant behavior to edge devices and resource-constrained environments. Built on the efficient TinyLlama architecture, this model incorporates carefully crafted system instructions emphasizing helpfulness, safety, and nuanced conversation.
Quick Start
# Pull the model
ollama pull thatdamai/tinyclaude-1b
# Run interactively
ollama run thatdamai/tinyclaude-1b
Features
- Compact Size: ~638MB download, runs on minimal hardware
- Privacy-First: Fully local inference, no API calls required
- Balanced Responses: System prompt encourages helpful, safe, and thoughtful outputs
- Low Resource Requirements: Runs on CPUs and entry-level GPUs
Hardware Requirements
| Component | Minimum | Recommended |
|---|---|---|
| RAM | 4GB | 8GB |
| VRAM | 2GB | 4GB |
| Storage | 1GB | 2GB |
Usage Examples
Basic Chat
ollama run thatdamai/tinyclaude-1b
API Integration
curl http://localhost:11434/api/generate -d '{
"model": "thatdamai/tinyclaude-1b",
"prompt": "Explain quantum computing simply.",
"stream": false
}'
Python
import requests
response = requests.post('http://localhost:11434/api/generate', json={
'model': 'thatdamai/tinyclaude-1b',
'prompt': 'What is machine learning?',
'stream': False
})
print(response.json()['response'])
With Open WebUI / LibreChat
Simply select thatdamai/tinyclaude-1b from the model dropdown after pulling.
Hugging Face Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model and tokenizer
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
# Define the TinyClaude system prompt
system_prompt = """You are a helpful, harmless, and honest AI assistant..."""
# Format with chat template
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "Explain quantum computing simply."}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
# Generate response
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Hugging Face with llama-cpp-python
from llama_cpp import Llama
# Download GGUF from Hugging Face Hub
llm = Llama.from_pretrained(
repo_id="TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF",
filename="tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf",
n_ctx=2048,
n_gpu_layers=-1 # Use all GPU layers
)
system_prompt = """You are a helpful, harmless, and honest AI assistant..."""
output = llm.create_chat_completion(
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": "What is machine learning?"}
],
temperature=0.7,
max_tokens=512
)
print(output['choices'][0]['message']['content'])
Hugging Face CLI
# Install huggingface_hub
pip install huggingface_hub
# Download model files
huggingface-cli download TinyLlama/TinyLlama-1.1B-Chat-v1.0 --local-dir ./tinyllama
# Download GGUF quantized version
huggingface-cli download TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf --local-dir ./tinyllama-gguf
Text Generation Inference (TGI)
# Run with Docker
docker run --gpus all --shm-size 1g -p 8080:80 \
ghcr.io/huggingface/text-generation-inference:latest \
--model-id TinyLlama/TinyLlama-1.1B-Chat-v1.0 \
--max-input-length 1024 \
--max-total-tokens 2048
# Query the endpoint
curl http://localhost:8080/generate \
-X POST \
-H 'Content-Type: application/json' \
-d '{"inputs": "<|system|>\nYou are a helpful assistant.</s>\n<|user|>\nHello!</s>\n<|assistant|>\n", "parameters": {"max_new_tokens": 256}}'
Model Details
| Property | Value |
|---|---|
| Base Model | TinyLlama 1.1B |
| Parameters | 1.1 Billion |
| Context Window | 2048 tokens |
| License | Apache 2.0 |
| Quantization | Q4_0 (default) |
Use Cases
TinyClaude-1B is well-suited for:
- Quick local prototyping and testing
- Educational environments
- IoT and edge deployments
- Offline assistant applications
- Low-latency response requirements
- Development and CI/CD pipelines
Limitations
As a 1.1B parameter model, TinyClaude-1B has inherent limitations:
- Complex reasoning tasks may produce inconsistent results
- Limited knowledge compared to larger models
- May not fully adhere to all system prompt guidelines
- Context window constrains long-form conversations
- Not suitable for production applications requiring high accuracy
For demanding tasks, consider larger models like Llama 3.1 8B, Mistral 7B, or Qwen 14B.
Building From Source
Create your own variant:
# Create a Modelfile
cat << 'EOF' > Modelfile
FROM tinyllama
SYSTEM """
Your custom system prompt here.
"""
PARAMETER temperature 0.7
PARAMETER num_ctx 2048
EOF
# Build the model
ollama create my-tinyclaude -f Modelfile
# Test it
ollama run my-tinyclaude
Hugging Face Integration
Uploading to Hugging Face Hub
# Install required tools
pip install huggingface_hub
# Login to Hugging Face
huggingface-cli login
# Create a new model repository
huggingface-cli repo create tinyclaude-1b --type model
# Upload model files
huggingface-cli upload thatdamai/tinyclaude-1b ./model-files --repo-type model
Converting Ollama to GGUF for Hugging Face
# Find your Ollama model location
ollama show thatdamai/tinyclaude-1b --modelfile
# Models are stored in ~/.ollama/models or /usr/share/ollama/.ollama/models
# Copy the blob files and upload to HF
# Alternative: Use ollama's model export (if available)
cp /usr/share/ollama/.ollama/models/blobs/<sha256-hash> ./tinyclaude.gguf
Creating a Hugging Face Model Card
Create a README.md in your HF repo with YAML frontmatter:
---
license: apache-2.0
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
tags:
- tinyllama
- gguf
- ollama
- assistant
- conversational
model_type: llama
pipeline_tag: text-generation
inference: false
---
Downloading from Hugging Face to Ollama
# Method 1: Create Modelfile pointing to HF GGUF
cat << 'EOF' > Modelfile
FROM hf.co/thatdamai/tinyclaude-1b-gguf
EOF
ollama create tinyclaude-local -f Modelfile
# Method 2: Download GGUF first, then import
huggingface-cli download thatdamai/tinyclaude-1b-gguf tinyclaude-1b.Q4_K_M.gguf --local-dir ./
cat << EOF > Modelfile
FROM ./tinyclaude-1b.Q4_K_M.gguf
EOF
ollama create tinyclaude-local -f Modelfile
Contributing
Suggestions and improvements are welcome. Feel free to:
- Open issues for bugs or feature requests
- Submit pull requests with improvements
- Share your custom Modelfile variants
Acknowledgments
- TinyLlama - Base model architecture
- Ollama - Local model serving platform
- Anthropic - Inspiration for behavioral guidelines
License
This model inherits the Apache 2.0 license from TinyLlama. The system prompt and configuration are provided as-is for educational and personal use.
Author: thatdamai
Model: thatdamai/tinyclaude-1b
Platform: Ollama
- Downloads last month
- 35
We're not able to determine the quantization variants.