Instructions to use acon96/Home-Llama-3.2-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use acon96/Home-Llama-3.2-3B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="acon96/Home-Llama-3.2-3B",
	filename="Home-Llama-3.2-3B.f16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Inference
Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use acon96/Home-Llama-3.2-3B with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf acon96/Home-Llama-3.2-3B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf acon96/Home-Llama-3.2-3B:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf acon96/Home-Llama-3.2-3B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf acon96/Home-Llama-3.2-3B:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf acon96/Home-Llama-3.2-3B:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf acon96/Home-Llama-3.2-3B:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf acon96/Home-Llama-3.2-3B:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf acon96/Home-Llama-3.2-3B:Q4_K_M

Use Docker

docker model run hf.co/acon96/Home-Llama-3.2-3B:Q4_K_M

LM Studio
Jan

vLLM

How to use acon96/Home-Llama-3.2-3B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "acon96/Home-Llama-3.2-3B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "acon96/Home-Llama-3.2-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/acon96/Home-Llama-3.2-3B:Q4_K_M

Ollama
How to use acon96/Home-Llama-3.2-3B with Ollama:
```
ollama run hf.co/acon96/Home-Llama-3.2-3B:Q4_K_M
```

Unsloth Studio new

How to use acon96/Home-Llama-3.2-3B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for acon96/Home-Llama-3.2-3B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for acon96/Home-Llama-3.2-3B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for acon96/Home-Llama-3.2-3B to start chatting

Pi new

How to use acon96/Home-Llama-3.2-3B with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf acon96/Home-Llama-3.2-3B:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "acon96/Home-Llama-3.2-3B:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use acon96/Home-Llama-3.2-3B with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf acon96/Home-Llama-3.2-3B:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default acon96/Home-Llama-3.2-3B:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use acon96/Home-Llama-3.2-3B with Docker Model Runner:
```
docker model run hf.co/acon96/Home-Llama-3.2-3B:Q4_K_M
```

Lemonade

How to use acon96/Home-Llama-3.2-3B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull acon96/Home-Llama-3.2-3B:Q4_K_M

Run and chat with the model

lemonade run user.Home-Llama-3.2-3B-Q4_K_M

List all available models

lemonade list

Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Home Llama 3.2 3B

The "Home Llama 3.2" model is a fine tuning of the Llama 3.2 3B model from Meta. The model is able to control devices in the user's house as well as perform basic question and answering. The model is explicitly trained to support English, German, Spanish, and French; the base model additionally supports Italian, Portuguese, Hindi, and Thai. The fine tuning dataset is a custom curated dataset designed to teach the model function calling.

The model is quantized using Lama.cpp in order to enable running the model in super low resource environments that are common with Home Assistant installations such as Rapsberry Pis.

The model can be used as an "instruct" type model using the Llama3 prompt format. The system prompt is used to provide information about the state of the Home Assistant installation including available devices and callable services.

Example "system" prompt:

You are 'Al', a helpful AI Assistant that controls the devices in a house. Complete the following task as instructed with the information provided only.
Services: light.turn_off(), light.turn_on(brightness,rgb_color), fan.turn_on(), fan.turn_off()
Devices:
light.office 'Office Light' = on;80%
fan.office 'Office fan' = off
light.kitchen 'Kitchen Light' = on;80%;red
light.bedroom 'Bedroom Light' = off

Output from the model will consist of a response that should be relayed back to the user, along with an optional code block that will invoke different Home Assistant "services". The output format from the model for function calling is as follows:

turning on the kitchen lights for you now
```homeassistant
{ "service": "light.turn_on", "target_device": "light.kitchen" }
```

The model is also capable of basic instruct and QA tasks because of the instruction fine-tuning in the base model. For example, the model is able to perform basic logic tasks such as the following:

user if mary is 7 years old, and I am 3 years older than her. how old am I?
assistant If Mary is 7 years old, then you are 10 years old (7+3=10).

Datasets

Snythetic Dataset for SFT - https://huggingface.co/datasets/acon96/Home-Assistant-Requests

License

This model is a fine-tuning of the Llama 3.2 model series that is licensed under the LLAMA 3.2 COMMUNITY LICENSE AGREEMENT

Downloads last month: 841

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for acon96/Home-Llama-3.2-3B

Base model

meta-llama/Llama-3.2-3B-Instruct

Finetuned

(1599)

this model

Quantizations

2 models

Dataset used to train acon96/Home-Llama-3.2-3B

Collection including acon96/Home-Llama-3.2-3B

Home LLM

Collection

Models that can be used to control Home Assistant • 7 items • Updated Jun 5, 2025 • 7