Instructions to use SandLogicTechnologies/Gemma-3-270m-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use SandLogicTechnologies/Gemma-3-270m-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="SandLogicTechnologies/Gemma-3-270m-GGUF", filename="gemma-3-270m_Q4_k_m.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use SandLogicTechnologies/Gemma-3-270m-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf SandLogicTechnologies/Gemma-3-270m-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf SandLogicTechnologies/Gemma-3-270m-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf SandLogicTechnologies/Gemma-3-270m-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf SandLogicTechnologies/Gemma-3-270m-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf SandLogicTechnologies/Gemma-3-270m-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf SandLogicTechnologies/Gemma-3-270m-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf SandLogicTechnologies/Gemma-3-270m-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf SandLogicTechnologies/Gemma-3-270m-GGUF:Q4_K_M
Use Docker
docker model run hf.co/SandLogicTechnologies/Gemma-3-270m-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use SandLogicTechnologies/Gemma-3-270m-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "SandLogicTechnologies/Gemma-3-270m-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SandLogicTechnologies/Gemma-3-270m-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/SandLogicTechnologies/Gemma-3-270m-GGUF:Q4_K_M
- Ollama
How to use SandLogicTechnologies/Gemma-3-270m-GGUF with Ollama:
ollama run hf.co/SandLogicTechnologies/Gemma-3-270m-GGUF:Q4_K_M
- Unsloth Studio new
How to use SandLogicTechnologies/Gemma-3-270m-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for SandLogicTechnologies/Gemma-3-270m-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for SandLogicTechnologies/Gemma-3-270m-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for SandLogicTechnologies/Gemma-3-270m-GGUF to start chatting
- Docker Model Runner
How to use SandLogicTechnologies/Gemma-3-270m-GGUF with Docker Model Runner:
docker model run hf.co/SandLogicTechnologies/Gemma-3-270m-GGUF:Q4_K_M
- Lemonade
How to use SandLogicTechnologies/Gemma-3-270m-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull SandLogicTechnologies/Gemma-3-270m-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Gemma-3-270m-GGUF-Q4_K_M
List all available models
lemonade list
Quantized Google Gemma-3-270M Model
This repository provides the Gemma-3-270M model, one of Google’s lightweight and efficient open models from the Gemma family. With only 270 million parameters, it is designed for fast inference, prototyping, and edge deployment, while retaining strong general-purpose reasoning, text understanding, and instruction-following abilities.
Model Overview
- Original Model: Gemma-3-270M
- Architecture: Decoder-only Transformer
- Base Model: Gemma-3 series
- Modalities: Text only
- Developer: Google DeepMind
- License: Gemma Terms of Use
- Language: English
Quantization Details
Q4_K_M Version
- Approx. ~54% size reduction
- Lower memory footprint (~241 MB)
- Slight performance degradation in complex reasoning scenarios
Q5_K_M Version
- Approx. ~52% size reduction
- Higher fidelity (~247 MB)
- Better performance retention, recommended when quality is a priority
Key Features
- Extremely lightweight (270M parameters) – easy to run on CPU, edge devices, or low-resource GPUs
- General-purpose instruction following
- Useful for text completion, reasoning, and lightweight assistant prototypes
- Designed to be fast and memory efficient while maintaining quality
- Good foundation for research, experimentation, and fine-tuning
llama.cpp (text-only)
./llama-cli -hf SandLogicTechnologies/gemma-3-270m_Q4_k_m -p "What is LLM Quantization"
Usage
This model is intended for developers and researchers who need a lightweight LLM for prototyping and small-scale applications.
Acknowledgments
These quantized models are based on the original work by Google development team.
Special thanks to:
The Google team for developing and releasing the gemma-3-270m model.
Georgi Gerganov and the entire
llama.cppopen-source community for enabling efficient model quantization and inference via the GGUF format.
Contact
For any inquiries or support, please contact us at support@sandlogic.com or visit our Website.
- Downloads last month
- 25
4-bit
5-bit
Model tree for SandLogicTechnologies/Gemma-3-270m-GGUF
Base model
google/gemma-3-270m