YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

TinyClaude-1B

A lightweight, locally-runnable language model based on TinyLlama 1.1B, enhanced with a sophisticated system prompt inspired by Claude's behavioral guidelines.

Overview

TinyClaude-1B brings thoughtful AI assistant behavior to edge devices and resource-constrained environments. Built on the efficient TinyLlama architecture, this model incorporates carefully crafted system instructions emphasizing helpfulness, safety, and nuanced conversation.

Quick Start

# Pull the model
ollama pull thatdamai/tinyclaude-1b

# Run interactively
ollama run thatdamai/tinyclaude-1b

Features

  • Compact Size: ~638MB download, runs on minimal hardware
  • Privacy-First: Fully local inference, no API calls required
  • Balanced Responses: System prompt encourages helpful, safe, and thoughtful outputs
  • Low Resource Requirements: Runs on CPUs and entry-level GPUs

Hardware Requirements

Component Minimum Recommended
RAM 4GB 8GB
VRAM 2GB 4GB
Storage 1GB 2GB

Usage Examples

Basic Chat

ollama run thatdamai/tinyclaude-1b

API Integration

curl http://localhost:11434/api/generate -d '{
  "model": "thatdamai/tinyclaude-1b",
  "prompt": "Explain quantum computing simply.",
  "stream": false
}'

Python

import requests

response = requests.post('http://localhost:11434/api/generate', json={
    'model': 'thatdamai/tinyclaude-1b',
    'prompt': 'What is machine learning?',
    'stream': False
})

print(response.json()['response'])

With Open WebUI / LibreChat

Simply select thatdamai/tinyclaude-1b from the model dropdown after pulling.

Hugging Face Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

# Define the TinyClaude system prompt
system_prompt = """You are a helpful, harmless, and honest AI assistant..."""

# Format with chat template
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "Explain quantum computing simply."}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate response
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Hugging Face with llama-cpp-python

from llama_cpp import Llama

# Download GGUF from Hugging Face Hub
llm = Llama.from_pretrained(
    repo_id="TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF",
    filename="tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf",
    n_ctx=2048,
    n_gpu_layers=-1  # Use all GPU layers
)

system_prompt = """You are a helpful, harmless, and honest AI assistant..."""

output = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "What is machine learning?"}
    ],
    temperature=0.7,
    max_tokens=512
)

print(output['choices'][0]['message']['content'])

Hugging Face CLI

# Install huggingface_hub
pip install huggingface_hub

# Download model files
huggingface-cli download TinyLlama/TinyLlama-1.1B-Chat-v1.0 --local-dir ./tinyllama

# Download GGUF quantized version
huggingface-cli download TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf --local-dir ./tinyllama-gguf

Text Generation Inference (TGI)

# Run with Docker
docker run --gpus all --shm-size 1g -p 8080:80 \
  ghcr.io/huggingface/text-generation-inference:latest \
  --model-id TinyLlama/TinyLlama-1.1B-Chat-v1.0 \
  --max-input-length 1024 \
  --max-total-tokens 2048

# Query the endpoint
curl http://localhost:8080/generate \
  -X POST \
  -H 'Content-Type: application/json' \
  -d '{"inputs": "<|system|>\nYou are a helpful assistant.</s>\n<|user|>\nHello!</s>\n<|assistant|>\n", "parameters": {"max_new_tokens": 256}}'

Model Details

Property Value
Base Model TinyLlama 1.1B
Parameters 1.1 Billion
Context Window 2048 tokens
License Apache 2.0
Quantization Q4_0 (default)

Use Cases

TinyClaude-1B is well-suited for:

  • Quick local prototyping and testing
  • Educational environments
  • IoT and edge deployments
  • Offline assistant applications
  • Low-latency response requirements
  • Development and CI/CD pipelines

Limitations

As a 1.1B parameter model, TinyClaude-1B has inherent limitations:

  • Complex reasoning tasks may produce inconsistent results
  • Limited knowledge compared to larger models
  • May not fully adhere to all system prompt guidelines
  • Context window constrains long-form conversations
  • Not suitable for production applications requiring high accuracy

For demanding tasks, consider larger models like Llama 3.1 8B, Mistral 7B, or Qwen 14B.

Building From Source

Create your own variant:

# Create a Modelfile
cat << 'EOF' > Modelfile
FROM tinyllama

SYSTEM """
Your custom system prompt here.
"""

PARAMETER temperature 0.7
PARAMETER num_ctx 2048
EOF

# Build the model
ollama create my-tinyclaude -f Modelfile

# Test it
ollama run my-tinyclaude

Hugging Face Integration

Uploading to Hugging Face Hub

# Install required tools
pip install huggingface_hub

# Login to Hugging Face
huggingface-cli login

# Create a new model repository
huggingface-cli repo create tinyclaude-1b --type model

# Upload model files
huggingface-cli upload thatdamai/tinyclaude-1b ./model-files --repo-type model

Converting Ollama to GGUF for Hugging Face

# Find your Ollama model location
ollama show thatdamai/tinyclaude-1b --modelfile

# Models are stored in ~/.ollama/models or /usr/share/ollama/.ollama/models
# Copy the blob files and upload to HF

# Alternative: Use ollama's model export (if available)
cp /usr/share/ollama/.ollama/models/blobs/<sha256-hash> ./tinyclaude.gguf

Creating a Hugging Face Model Card

Create a README.md in your HF repo with YAML frontmatter:

---
license: apache-2.0
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
tags:
  - tinyllama
  - gguf
  - ollama
  - assistant
  - conversational
model_type: llama
pipeline_tag: text-generation
inference: false
---

Downloading from Hugging Face to Ollama

# Method 1: Create Modelfile pointing to HF GGUF
cat << 'EOF' > Modelfile
FROM hf.co/thatdamai/tinyclaude-1b-gguf
EOF

ollama create tinyclaude-local -f Modelfile

# Method 2: Download GGUF first, then import
huggingface-cli download thatdamai/tinyclaude-1b-gguf tinyclaude-1b.Q4_K_M.gguf --local-dir ./

cat << EOF > Modelfile
FROM ./tinyclaude-1b.Q4_K_M.gguf
EOF

ollama create tinyclaude-local -f Modelfile

Contributing

Suggestions and improvements are welcome. Feel free to:

  • Open issues for bugs or feature requests
  • Submit pull requests with improvements
  • Share your custom Modelfile variants

Acknowledgments

  • TinyLlama - Base model architecture
  • Ollama - Local model serving platform
  • Anthropic - Inspiration for behavioral guidelines

License

This model inherits the Apache 2.0 license from TinyLlama. The system prompt and configuration are provided as-is for educational and personal use.


Author: thatdamai
Model: thatdamai/tinyclaude-1b
Platform: Ollama

Downloads last month
35
GGUF
Model size
1B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support