Instructions to use NhatCuong22/gemma-4-e4b-scientific-reviewer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use NhatCuong22/gemma-4-e4b-scientific-reviewer with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="NhatCuong22/gemma-4-e4b-scientific-reviewer") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("NhatCuong22/gemma-4-e4b-scientific-reviewer") model = AutoModelForImageTextToText.from_pretrained("NhatCuong22/gemma-4-e4b-scientific-reviewer") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use NhatCuong22/gemma-4-e4b-scientific-reviewer with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "NhatCuong22/gemma-4-e4b-scientific-reviewer" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "NhatCuong22/gemma-4-e4b-scientific-reviewer", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/NhatCuong22/gemma-4-e4b-scientific-reviewer
- SGLang
How to use NhatCuong22/gemma-4-e4b-scientific-reviewer with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "NhatCuong22/gemma-4-e4b-scientific-reviewer" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "NhatCuong22/gemma-4-e4b-scientific-reviewer", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "NhatCuong22/gemma-4-e4b-scientific-reviewer" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "NhatCuong22/gemma-4-e4b-scientific-reviewer", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use NhatCuong22/gemma-4-e4b-scientific-reviewer with Docker Model Runner:
docker model run hf.co/NhatCuong22/gemma-4-e4b-scientific-reviewer
gemma-4-e4b-scientific-reviewer
Fine-tuned Gemma 4 E4B for academic peer review on CS/AI conference papers — generates a structured review (Summary · Strengths · Weaknesses · Technical Soundness · Clarity · Significance) and an accept/reject decision.
- Base model:
unsloth/gemma-4-E4B-it - Method: On-policy distillation from Gemini 3.1 Pro (LoRA SFT, r=32, 5 epochs)
- Teacher: Gemini 3.1 Pro Preview (
gemini-3.1-pro-preview) via Vertex AI - Training data: 334 (paper, teacher review, decision) pairs — prompts sampled from PeerRead ICLR 2017 + OpenReview ICLR 2019-2021, targets generated by Gemini under the same reviewer system prompt
- Benchmark: PeerRead 95 papers (ACL 2017 + CoNLL 2016 + ICLR 2017 test/dev), zero train-set overlap
Performance
Same benchmark prompt on fine-tuned and base.
| Metric | Fine-tuned | Base (gemma-4-E4B-it) |
|---|---|---|
| Decision F1 | 0.892 | 0.843 |
| Accuracy | 0.811 | 0.737 |
| Precision | 0.860 | 0.848 |
| Recall | 0.925 | 0.838 |
| Predicted accept rate | 90.5% | 83.2% |
| Avg inference time | ~2 s / paper | ~2 s / paper |
Ground-truth accept rate on this benchmark: 84.2%.
Per-venue F1 (fine-tuned): ICLR 2017 0.922 · ACL 2017 0.762 · CoNLL 2016 0.500 (n=3).
Calibration
A key property of the distilled model is calibrated decisions. The teacher (Gemini 3.1 Pro) accepts ~74% of papers on the training prompts, and the student inherits this calibration: 90.5% predicted accept rate versus 84.2% ground truth — a 6.3 pp gap on a test set that is already skewed toward acceptance.
Usage
Serve with vLLM:
python -m vllm.entrypoints.openai.api_server \
--model NhatCuong22/gemma-4-e4b-scientific-reviewer \
--dtype bfloat16 \
--max-model-len 12288 \
--gpu-memory-utilization 0.88 \
--max-num-seqs 16 \
--enable-prefix-caching
Call with the benchmark prompt:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
SYSTEM = (
"You are an experienced academic peer reviewer for top AI/ML conferences "
"(ICLR, NeurIPS, ACL, CVPR). Produce structured and substantive reviews.\n\n"
"RULES:\n"
"1. Be SPECIFIC - reference concrete details from the paper.\n"
"2. Be FAIR - list strengths AND weaknesses honestly.\n"
"3. Your final recommendation should reflect real conference accept rates (~30-50%). "
"Reserve rejection for papers with major methodological issues."
)
USER_TMPL = """Review the following paper. Format:
## Summary
<1-2 paragraphs on contribution>
## Strengths
- <specific strength 1>
## Weaknesses
- <specific weakness 1>
## Technical Soundness
<paragraph>
## Clarity and Writing
<paragraph>
## Significance
<paragraph>
## Final Recommendation
End with EXACTLY ONE line:
I recommend this paper for ACCEPTANCE.
I recommend this paper for REJECTION.
---
Title: {title}
Abstract: {abstract}
Paper content:
{body}
"""
resp = client.chat.completions.create(
model="NhatCuong22/gemma-4-e4b-scientific-reviewer",
messages=[
{"role": "system", "content": SYSTEM},
{"role": "user", "content": USER_TMPL.format(title=..., abstract=..., body=...)},
],
max_tokens=1400,
temperature=0.2,
top_p=0.9,
)
print(resp.choices[0].message.content)
Method: Gemini distillation
- Prompt pool: 434 unique papers from PeerRead ICLR 2017 + OpenReview ICLR 2019-2021 with their system/user prompts.
- Teacher generation: Gemini 3.1 Pro Preview produced one review per prompt on Vertex AI (temperature 0.3, thinking budget 1024). Total teacher tokens: ~6.5M in / ~1.6M out.
- Filtering: keep only rows where the regex
I recommend this paper for (ACCEPTANCE|REJECTION)\.?parses cleanly; dedupe by review-text hash. Yields 334 train + 100 val. - Student SFT: LoRA r=32, α=64 on Gemma 4 E4B via Unsloth; 5 epochs, cosine LR 2e-5, batch 8, bf16. Final train loss 2.628. Merged to 16-bit.
The student inherits the teacher's reviewing style and calibrated decision distribution — whereas training directly on human PeerRead reviews tends to collapse to "accept everything" because the test set is 84% accept.
Training Setup
| Setting | Value |
|---|---|
| Base model | unsloth/gemma-4-E4B-it (4-bit via Unsloth) |
| Method | LoRA (r=32, α=64, dropout=0, all linear projections) |
| Teacher | gemini-3.1-pro-preview (Vertex AI) |
| Training examples | 334 pairs (72.1% accept) + 100 val |
| Epochs | 5 |
| Steps | 210 |
| Effective batch | 8 (per-device 1 × grad-accum 8) |
| Max seq len | 8192 |
| LR | 2e-5 cosine, warmup 3% |
| Optimizer | AdamW 8-bit |
| Precision | bf16 |
| Seed | 42 |
| Hardware | 1 × RTX 4080 SUPER 32 GB |
| Wall-clock | 95 min training + 13 min merge |
| Training loss | 9.52 → 2.63 |
Artifacts
model.safetensors— merged 16-bit weightsadapter_model.safetensors— LoRA adapter (for applying on top of the base)benchmark/peerread_ft_full.jsonl— per-paper generations on the 95-paper test setbenchmark/peerread_ft_metrics.json— aggregated metricsbenchmark/peerread_base_full.jsonl— base-model predictions on the same test setbenchmark/peerread_base_metrics.json— base-model metrics
Known limitations
- Benchmark is skewed toward acceptance (84.2%), so a model that always accepts scores a deceptively high F1. The calibration plot is the more robust signal.
- Training prompts are dominated by ICLR; performance on non-ICLR venues (ACL, CoNLL) is lower.
- Teacher and student share the Gemma/Gemini SentencePiece tokenizer family but are different architectures, so this is soft-label-free SFT distillation (text-level), not logit-level KD.
License
Gemma license, inherited from the base model.
- Downloads last month
- 25


