Instructions to use CraneAILabs/crane-medgemma-1.5-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use CraneAILabs/crane-medgemma-1.5-it with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="CraneAILabs/crane-medgemma-1.5-it") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("CraneAILabs/crane-medgemma-1.5-it") model = AutoModelForImageTextToText.from_pretrained("CraneAILabs/crane-medgemma-1.5-it") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use CraneAILabs/crane-medgemma-1.5-it with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "CraneAILabs/crane-medgemma-1.5-it" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CraneAILabs/crane-medgemma-1.5-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/CraneAILabs/crane-medgemma-1.5-it
- SGLang
How to use CraneAILabs/crane-medgemma-1.5-it with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "CraneAILabs/crane-medgemma-1.5-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CraneAILabs/crane-medgemma-1.5-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "CraneAILabs/crane-medgemma-1.5-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CraneAILabs/crane-medgemma-1.5-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use CraneAILabs/crane-medgemma-1.5-it with Docker Model Runner:
docker model run hf.co/CraneAILabs/crane-medgemma-1.5-it
Crane MedGemma 1.5 IT
A clinical decision-support model fine-tuned from MedGemma 1.5 4B-IT on the Uganda Clinical Guidelines 2023, with Direct Preference Optimization (DPO) for improved format compliance and triage safety.
This is the primary production checkpoint — best overall performance across format compliance, triage safety, and clinical content quality. For the conservative SFT-only checkpoint, see v2.1-instruct-medgemma-bf16.
This model is a clinical thinking aid. It does not provide diagnoses, prescriptions, or automated clinical decisions.
Intended Use
Primary use case: Structured triage assessment for frontline health workers at Health Centre II–IV facilities in Uganda. Designed for the production Android app where compact XML output is parsed for clinical decision support at the point of care.
Output formats: XML, positional array, and prose — selected via system prompt. XML is the primary production format (~50% fewer tokens than prose, optimized for on-device latency).
Deployment target: On-device inference on Android via quantized GGUF. This bf16 checkpoint is the full-precision reference.
In-Scope
- Structured triage assessments with triage level, condition, confidence, suggestions, next steps, and red flags
- Differential diagnosis reasoning for conditions in the Uganda Clinical Guidelines
- Danger sign identification and referral triage
- Investigation recommendations
- Format-switching between XML, array, and prose based on system prompt
- Refusal of out-of-scope queries (treatment, dosing)
Out-of-Scope
- Treatment and dosing recommendations — the model refuses these queries by design
- Diagnostic conclusions or prescriptions
- Multi-turn conversations
- Languages other than English
- Conditions not covered by the Uganda Clinical Guidelines 2023
Model Details
| Base model | google/medgemma-1.5-4b-it (Gemma 3 4B architecture) |
| Parameters | 4.3B |
| Precision | BF16 |
| Training method | QLoRA SFT + DPO (Direct Preference Optimization) |
| Training data | Decision-support Q&A pairs from UCG 2023 + format-switching instruction data + DPO preference pairs |
| Scope | Decision-support only — treatment and dosing excluded by design |
Training Approach
This model was trained in three stages:
Decision-support SFT: QLoRA fine-tuning on clinical Q&A pairs covering 7 categories (differential diagnosis, diagnosis, referral, investigation, danger signs, special populations, refusal). Treatment and dosing were excluded after evaluation confirmed a capacity ceiling on factual drug recall at the 4B parameter scale.
Instruction-following SFT: Continued fine-tuning from the SFT checkpoint to teach format-switching — emitting XML, array, or prose triage packets depending on the system prompt.
DPO alignment: Preference optimization using a structured taxonomy of preference pairs targeting format compliance, refusal consistency, and triage safety. DPO was stacked on the conservative SFT checkpoint (v2.1) to preserve its strong clinical reasoning baseline.
What the Model Refuses
The model refuses treatment and dosing questions across all output formats. This is a deliberate safety boundary — drug name confusion at the 4B parameter scale made treatment responses unreliable.
Crane MedGemma 1.5 IT vs v2.1
| v2.1 (SFT-only) | This model (SFT + DPO) | |
|---|---|---|
| Training | SFT only | SFT + DPO |
| Best for | Prose Q&A, refusal compliance | XML triage packets, triage safety |
| Prose refusal | 5.00/5 (perfect) | 4.33/5 |
| XML parse rate | 98.2% | 99.4% |
| Array parse rate | 92.2% | 97.6% |
| Triage safety | 2.63/5 | 3.08/5 (best) |
| Ship gates passed | 5/12 | 6/12 |
Evaluation
Evaluated across three benchmarks using Gemini as an automated evaluator.
Ship Gate Analysis
| Gate | Target | Result |
|---|---|---|
| Prose content quality (210 samples) | >= 3.43/5 | 3.41 |
| XML parse rate | >= 95% | 99.4% |
| Array parse rate | >= 95% | 97.6% |
| Prose parse rate | >= 95% | 100% |
| XML content quality (held-out) | >= 3.43/5 | 3.02 |
| Triage safety (50 presentations) | — | 3.08 (best) |
| XML refusal | >= 4.5/5 | 4.45 |
6 of 12 ship gates passed. Held-out content quality caps at ~3.0/5 — a 4B parameter capacity ceiling. RAG is expected to close this gap.
Strengths
- Best triage safety: 3.08/5 on clinical presentation prompts — highest across all checkpoints
- Best format compliance: 99.4% XML parse, 97.6% array parse
- Strong XML refusal: 4.45/5 — closest to the 4.5 gate target
- DPO improved format without losing content: Clinical quality maintained (3.41/5 prose) while format compliance increased
Known Limitations
- 4B parameter capacity ceiling: Held-out content quality caps at ~3.0/5. Factual recall for unseen conditions is limited at this parameter scale.
- Treatment and dosing excluded: By design. The model refuses these queries.
- Refusal regression from DPO: Prose refusal dropped from 5.00 (SFT) to 4.33 after DPO. The DPO preference data slightly loosened the refusal boundary.
- Special populations weakness: 1.22/5 — limited population-specific detail in the source guidelines.
- Single-turn only: No conversational follow-up capability.
How to Use
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "CraneAILabs/crane-medgemma-1.5-it"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="bfloat16", device_map="auto")
# XML triage output
messages = [
{"role": "system", "content": "Respond in XML: <r><t>level</t><c>condition</c><cf>confidence</cf><sg>action1|action2</sg><ns>follow-up1|follow-up2</ns><rf>danger1|danger2</rf></r>"},
{"role": "user", "content": "A 28-year-old pregnant woman at 32 weeks presents with severe headache, blurred vision, and blood pressure 160/110."}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, temperature=0.3)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
Ethical Considerations
- Not a replacement for clinical judgment. All outputs are advisory. The clinician makes all decisions.
- No patient data was used in training. All data derives from published government clinical guidelines.
- Offline-first deployment — no patient data leaves the device.
- Scope boundaries are safety boundaries. Treatment/dosing refusal prevents harm from drug name confusion at this model scale.
Citation
If you use this model, please cite:
@misc{crane-medgemma-2026,
title={Crane MedGemma 1.5 IT: Clinical Decision-Support for Uganda Clinical Guidelines},
author={Crane AI Labs},
year={2026},
publisher={Hugging Face},
url={https://huggingface.co/CraneAILabs/crane-medgemma-1.5-it}
}
License
This model is subject to the MedGemma Terms of Use. Additional fine-tuning artifacts are proprietary to Crane AI Labs.
- Downloads last month
- 22
Model tree for CraneAILabs/crane-medgemma-1.5-it
Base model
google/medgemma-1.5-4b-it