Instructions to use CraneAILabs/crane-medgemma-1.5-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use CraneAILabs/crane-medgemma-1.5-it with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="CraneAILabs/crane-medgemma-1.5-it")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("CraneAILabs/crane-medgemma-1.5-it")
model = AutoModelForImageTextToText.from_pretrained("CraneAILabs/crane-medgemma-1.5-it")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use CraneAILabs/crane-medgemma-1.5-it with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "CraneAILabs/crane-medgemma-1.5-it"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CraneAILabs/crane-medgemma-1.5-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/CraneAILabs/crane-medgemma-1.5-it

SGLang

How to use CraneAILabs/crane-medgemma-1.5-it with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "CraneAILabs/crane-medgemma-1.5-it" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CraneAILabs/crane-medgemma-1.5-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "CraneAILabs/crane-medgemma-1.5-it" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CraneAILabs/crane-medgemma-1.5-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use CraneAILabs/crane-medgemma-1.5-it with Docker Model Runner:
```
docker model run hf.co/CraneAILabs/crane-medgemma-1.5-it
```

Crane MedGemma 1.5 IT

A clinical decision-support model fine-tuned from MedGemma 1.5 4B-IT on the Uganda Clinical Guidelines 2023, with Direct Preference Optimization (DPO) for improved format compliance and triage safety.

This is the primary production checkpoint — best overall performance across format compliance, triage safety, and clinical content quality. For the conservative SFT-only checkpoint, see v2.1-instruct-medgemma-bf16.

This model is a clinical thinking aid. It does not provide diagnoses, prescriptions, or automated clinical decisions.

Intended Use

Primary use case: Structured triage assessment for frontline health workers at Health Centre II–IV facilities in Uganda. Designed for the production Android app where compact XML output is parsed for clinical decision support at the point of care.

Output formats: XML, positional array, and prose — selected via system prompt. XML is the primary production format (~50% fewer tokens than prose, optimized for on-device latency).

Deployment target: On-device inference on Android via quantized GGUF. This bf16 checkpoint is the full-precision reference.

In-Scope

Structured triage assessments with triage level, condition, confidence, suggestions, next steps, and red flags
Differential diagnosis reasoning for conditions in the Uganda Clinical Guidelines
Danger sign identification and referral triage
Investigation recommendations
Format-switching between XML, array, and prose based on system prompt
Refusal of out-of-scope queries (treatment, dosing)

Out-of-Scope

Treatment and dosing recommendations — the model refuses these queries by design
Diagnostic conclusions or prescriptions
Multi-turn conversations
Languages other than English
Conditions not covered by the Uganda Clinical Guidelines 2023

Model Details


Base model	`google/medgemma-1.5-4b-it` (Gemma 3 4B architecture)
Parameters	4.3B
Precision	BF16
Training method	QLoRA SFT + DPO (Direct Preference Optimization)
Training data	Decision-support Q&A pairs from UCG 2023 + format-switching instruction data + DPO preference pairs
Scope	Decision-support only — treatment and dosing excluded by design

Training Approach

This model was trained in three stages:

Decision-support SFT: QLoRA fine-tuning on clinical Q&A pairs covering 7 categories (differential diagnosis, diagnosis, referral, investigation, danger signs, special populations, refusal). Treatment and dosing were excluded after evaluation confirmed a capacity ceiling on factual drug recall at the 4B parameter scale.
Instruction-following SFT: Continued fine-tuning from the SFT checkpoint to teach format-switching — emitting XML, array, or prose triage packets depending on the system prompt.
DPO alignment: Preference optimization using a structured taxonomy of preference pairs targeting format compliance, refusal consistency, and triage safety. DPO was stacked on the conservative SFT checkpoint (v2.1) to preserve its strong clinical reasoning baseline.

What the Model Refuses

The model refuses treatment and dosing questions across all output formats. This is a deliberate safety boundary — drug name confusion at the 4B parameter scale made treatment responses unreliable.

Crane MedGemma 1.5 IT vs v2.1

	v2.1 (SFT-only)	This model (SFT + DPO)
Training	SFT only	SFT + DPO
Best for	Prose Q&A, refusal compliance	XML triage packets, triage safety
Prose refusal	5.00/5 (perfect)	4.33/5
XML parse rate	98.2%	99.4%
Array parse rate	92.2%	97.6%
Triage safety	2.63/5	3.08/5 (best)
Ship gates passed	5/12	6/12

Evaluation

Evaluated across three benchmarks using Gemini as an automated evaluator.

Ship Gate Analysis

Gate	Target	Result
Prose content quality (210 samples)	>= 3.43/5	3.41
XML parse rate	>= 95%	99.4%
Array parse rate	>= 95%	97.6%
Prose parse rate	>= 95%	100%
XML content quality (held-out)	>= 3.43/5	3.02
Triage safety (50 presentations)	—	3.08 (best)
XML refusal	>= 4.5/5	4.45

6 of 12 ship gates passed. Held-out content quality caps at ~3.0/5 — a 4B parameter capacity ceiling. RAG is expected to close this gap.

Strengths

Best triage safety: 3.08/5 on clinical presentation prompts — highest across all checkpoints
Best format compliance: 99.4% XML parse, 97.6% array parse
Strong XML refusal: 4.45/5 — closest to the 4.5 gate target
DPO improved format without losing content: Clinical quality maintained (3.41/5 prose) while format compliance increased

Known Limitations

4B parameter capacity ceiling: Held-out content quality caps at ~3.0/5. Factual recall for unseen conditions is limited at this parameter scale.
Treatment and dosing excluded: By design. The model refuses these queries.
Refusal regression from DPO: Prose refusal dropped from 5.00 (SFT) to 4.33 after DPO. The DPO preference data slightly loosened the refusal boundary.
Special populations weakness: 1.22/5 — limited population-specific detail in the source guidelines.
Single-turn only: No conversational follow-up capability.

How to Use

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "CraneAILabs/crane-medgemma-1.5-it"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="bfloat16", device_map="auto")

# XML triage output
messages = [
    {"role": "system", "content": "Respond in XML: <r><t>level</t><c>condition</c><cf>confidence</cf><sg>action1|action2</sg><ns>follow-up1|follow-up2</ns><rf>danger1|danger2</rf></r>"},
    {"role": "user", "content": "A 28-year-old pregnant woman at 32 weeks presents with severe headache, blurred vision, and blood pressure 160/110."}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, temperature=0.3)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))

Ethical Considerations

Not a replacement for clinical judgment. All outputs are advisory. The clinician makes all decisions.
No patient data was used in training. All data derives from published government clinical guidelines.
Offline-first deployment — no patient data leaves the device.
Scope boundaries are safety boundaries. Treatment/dosing refusal prevents harm from drug name confusion at this model scale.

Citation

If you use this model, please cite:

@misc{crane-medgemma-2026,
  title={Crane MedGemma 1.5 IT: Clinical Decision-Support for Uganda Clinical Guidelines},
  author={Crane AI Labs},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/CraneAILabs/crane-medgemma-1.5-it}
}

License

This model is subject to the MedGemma Terms of Use. Additional fine-tuning artifacts are proprietary to Crane AI Labs.

Downloads last month: 22

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for CraneAILabs/crane-medgemma-1.5-it

Base model

google/medgemma-1.5-4b-it

Finetuned

(68)

this model