Instructions to use michelinolinolino/gemma4-4b-sci with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use michelinolinolino/gemma4-4b-sci with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="michelinolinolino/gemma4-4b-sci", filename="model.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use michelinolinolino/gemma4-4b-sci with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf michelinolinolino/gemma4-4b-sci # Run inference directly in the terminal: llama-cli -hf michelinolinolino/gemma4-4b-sci
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf michelinolinolino/gemma4-4b-sci # Run inference directly in the terminal: llama-cli -hf michelinolinolino/gemma4-4b-sci
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf michelinolinolino/gemma4-4b-sci # Run inference directly in the terminal: ./llama-cli -hf michelinolinolino/gemma4-4b-sci
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf michelinolinolino/gemma4-4b-sci # Run inference directly in the terminal: ./build/bin/llama-cli -hf michelinolinolino/gemma4-4b-sci
Use Docker
docker model run hf.co/michelinolinolino/gemma4-4b-sci
- LM Studio
- Jan
- vLLM
How to use michelinolinolino/gemma4-4b-sci with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "michelinolinolino/gemma4-4b-sci" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "michelinolinolino/gemma4-4b-sci", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/michelinolinolino/gemma4-4b-sci
- Ollama
How to use michelinolinolino/gemma4-4b-sci with Ollama:
ollama run hf.co/michelinolinolino/gemma4-4b-sci
- Unsloth Studio new
How to use michelinolinolino/gemma4-4b-sci with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for michelinolinolino/gemma4-4b-sci to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for michelinolinolino/gemma4-4b-sci to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for michelinolinolino/gemma4-4b-sci to start chatting
- Docker Model Runner
How to use michelinolinolino/gemma4-4b-sci with Docker Model Runner:
docker model run hf.co/michelinolinolino/gemma4-4b-sci
- Lemonade
How to use michelinolinolino/gemma4-4b-sci with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull michelinolinolino/gemma4-4b-sci
Run and chat with the model
lemonade run user.gemma4-4b-sci-{{QUANT_TAG}}List all available models
lemonade list
gemma4-4b-sci
Early-stage research experiment. Trained for 1 epoch on 30K examples. Expect hallucinations and factual errors.
gemma4-4b-sci is a scientific-domain fine-tune of Gemma 4 E4B via QLoRA on 30,000 examples from OpenSciLM/OS_Train_Data and SciRIFF. Inspired by OpenScholar — this is a generation-only model without a retrieval pipeline.
Model Description
- Developed by: Michele Banfi
- Base model:
unsloth/gemma-4-E4B-it - Method: QLoRA (4-bit) + SFT via Unsloth, language layers only (vision encoder frozen)
- Training: 1 epoch, 30K examples (15K OS_Train_Data + 15K SciRIFF), NVIDIA RTX 5090
- License: Gemma Terms of Use
Model Sources
- Repository: https://github.com/michelebanfi/gemma-4-finetuning
- Evaluation: ScholarQABench
- Ollama:
ollama run hf.co/michelinolinolino/gemma4-4b-sci
Quick Start
ollama run hf.co/michelinolinolino/gemma4-4b-sci
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model = AutoModelForCausalLM.from_pretrained("michelinolinolino/gemma4-4b-sci", torch_dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("michelinolinolino/gemma4-4b-sci")
messages = [{"role": "user", "content": "Explain the role of CRISPR-Cas9 in gene editing."}]
input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
print(tokenizer.decode(model.generate(input_ids, max_new_tokens=512)[0][input_ids.shape[1]:], skip_special_tokens=True))
Evaluation
ScholarQABench — draft results, 1-epoch run. Gold paper contexts provided (fair comparison with OpenScholar-8B).
| Task | Metric | gemma4-4b-sci | OpenScholar-8B |
|---|---|---|---|
| SciFact (208) | Accuracy | 77.9% | 76.4% |
| PubMedQA (843) | Accuracy | 81.5% | 76.0% |
| QASA (1375) | ROUGE-L | 20.9 | 23.0 |
| SciFact | Citation F1 | 0.0 | 68.9 |
| PubMedQA | Citation F1 | 0.0 | 43.6 |
| QASA | Citation F1 | 4.3 | 56.3 |
Correctness matches or exceeds OpenScholar-8B (2× the parameters) at 1 epoch. Citation gap is entirely due to the missing retrieval pipeline.
Citation
@article{asai2024openscholar,
title = {OpenScholar: Synthesizing Scientific Literature with Retrieval-Augmented LMs},
author = {Asai, Akari and others},
journal = {Nature},
year = {2024},
url = {https://allenai.org/blog/nature-openscilm}
}
- Downloads last month
- 321
We're not able to determine the quantization variants.