How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="dispatchAI/Llama-3.2-1B-Instruct-Q4-mobile",
	filename="model.gguf",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Llama 3.2 1B Instruct - Q4 Mobile (GGUF)

Meta's Llama 3.2 1B Instruct, quantized to INT4 GGUF format for mobile deployment by Dispatch AI.

Property Value
Base meta-llama/Llama-3.2-1B-Instruct
Parameters 1.23 billion
Quantization Q4_K_M (4-bit k-means)
Size ~767 MB
Format GGUF (llama.cpp)
License Llama 3.2 Community

Why This Model?

Mobile-optimized for deployment on Android phones (Snapdragon 865+), laptops, IoT devices, and any hardware with 4GB+ RAM. No GPU required.

Performance on Samsung S20 FE (Snapdragon 865)

Metric This Version Original FP16
Size 767 MB ~2.5 GB
Speed ~28 tok/s CPU ~8 tok/s
Memory ~1.2 GB ~3.8 GB
Quality ~95% of original 100% baseline

Use Cases

  • Chatbots & conversational AI on mobile devices
  • Instruction following in resource-constrained environments
  • Content summarization, text classification, RAG pipelines
  • Educational apps, tutoring systems

Quick Start

# Install llama.cpp
git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp && cmake -B build -DLLAMA_NATIVE=ON && cmake --build build --config Release

# Download this model
huggingface-cli download dispatchAI/Llama-3.2-1B-Instruct-Q4-mobile ggml-model-Q4_K_M.gguf --local-dir ./models

# Run inference immediately
./build/bin/main -m ./models/ggml-model-Q4_K_M.gguf -p "Hello" -n 256 -t 4

Hardware Requirements

Requirement Minimum Recommended
RAM 4 GB 6 GB+
Storage 1 GB free 2 GB+
CPU 4-core ARM64/x86_64 8-core Snapdragon 865+
GPU Not required Any (faster)

Limitations

  • ~5% quality degradation vs FP16 on complex reasoning tasks
  • Not suitable for high-precision numerical computation
  • Context window follows base model (~128K tokens)

About Dispatch AI

Re-engineering LLMs for mobile and edge deployment. HuggingFace - 40+ models, 13K+ downloads

Downloads last month
991
GGUF
Model size
1B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Spaces using dispatchAI/Llama-3.2-1B-Instruct-Q4-mobile 4

Collections including dispatchAI/Llama-3.2-1B-Instruct-Q4-mobile