Instructions to use NhatCuong22/gemma-4-e4b-scientific-reviewer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use NhatCuong22/gemma-4-e4b-scientific-reviewer with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="NhatCuong22/gemma-4-e4b-scientific-reviewer")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("NhatCuong22/gemma-4-e4b-scientific-reviewer")
model = AutoModelForImageTextToText.from_pretrained("NhatCuong22/gemma-4-e4b-scientific-reviewer")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use NhatCuong22/gemma-4-e4b-scientific-reviewer with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "NhatCuong22/gemma-4-e4b-scientific-reviewer"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NhatCuong22/gemma-4-e4b-scientific-reviewer",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/NhatCuong22/gemma-4-e4b-scientific-reviewer

SGLang

How to use NhatCuong22/gemma-4-e4b-scientific-reviewer with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "NhatCuong22/gemma-4-e4b-scientific-reviewer" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NhatCuong22/gemma-4-e4b-scientific-reviewer",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "NhatCuong22/gemma-4-e4b-scientific-reviewer" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NhatCuong22/gemma-4-e4b-scientific-reviewer",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use NhatCuong22/gemma-4-e4b-scientific-reviewer with Docker Model Runner:
```
docker model run hf.co/NhatCuong22/gemma-4-e4b-scientific-reviewer
```

gemma-4-e4b-scientific-reviewer

Fine-tuned Gemma 4 E4B for academic peer review on CS/AI conference papers — generates a structured review (Summary · Strengths · Weaknesses · Technical Soundness · Clarity · Significance) and an accept/reject decision.

Base model: unsloth/gemma-4-E4B-it
Method: On-policy distillation from Gemini 3.1 Pro (LoRA SFT, r=32, 5 epochs)
Teacher: Gemini 3.1 Pro Preview (gemini-3.1-pro-preview) via Vertex AI
Training data: 334 (paper, teacher review, decision) pairs — prompts sampled from PeerRead ICLR 2017 + OpenReview ICLR 2019-2021, targets generated by Gemini under the same reviewer system prompt
Benchmark: PeerRead 95 papers (ACL 2017 + CoNLL 2016 + ICLR 2017 test/dev), zero train-set overlap

Performance

Same benchmark prompt on fine-tuned and base.

Metric	Fine-tuned	Base (`gemma-4-E4B-it`)
Decision F1	0.892	0.843
Accuracy	0.811	0.737
Precision	0.860	0.848
Recall	0.925	0.838
Predicted accept rate	90.5%	83.2%
Avg inference time	~2 s / paper	~2 s / paper

Ground-truth accept rate on this benchmark: 84.2%.

Per-venue F1 (fine-tuned): ICLR 2017 0.922 · ACL 2017 0.762 · CoNLL 2016 0.500 (n=3).

Calibration

A key property of the distilled model is calibrated decisions. The teacher (Gemini 3.1 Pro) accepts ~74% of papers on the training prompts, and the student inherits this calibration: 90.5% predicted accept rate versus 84.2% ground truth — a 6.3 pp gap on a test set that is already skewed toward acceptance.

Usage

Serve with vLLM:

python -m vllm.entrypoints.openai.api_server \
  --model NhatCuong22/gemma-4-e4b-scientific-reviewer \
  --dtype bfloat16 \
  --max-model-len 12288 \
  --gpu-memory-utilization 0.88 \
  --max-num-seqs 16 \
  --enable-prefix-caching

Call with the benchmark prompt:

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

SYSTEM = (
    "You are an experienced academic peer reviewer for top AI/ML conferences "
    "(ICLR, NeurIPS, ACL, CVPR). Produce structured and substantive reviews.\n\n"
    "RULES:\n"
    "1. Be SPECIFIC - reference concrete details from the paper.\n"
    "2. Be FAIR - list strengths AND weaknesses honestly.\n"
    "3. Your final recommendation should reflect real conference accept rates (~30-50%). "
    "Reserve rejection for papers with major methodological issues."
)

USER_TMPL = """Review the following paper. Format:

## Summary
<1-2 paragraphs on contribution>

## Strengths
- <specific strength 1>

## Weaknesses
- <specific weakness 1>

## Technical Soundness
<paragraph>

## Clarity and Writing
<paragraph>

## Significance
<paragraph>

## Final Recommendation
End with EXACTLY ONE line:
    I recommend this paper for ACCEPTANCE.
    I recommend this paper for REJECTION.

---

Title: {title}
Abstract: {abstract}
Paper content:
{body}
"""

resp = client.chat.completions.create(
    model="NhatCuong22/gemma-4-e4b-scientific-reviewer",
    messages=[
        {"role": "system", "content": SYSTEM},
        {"role": "user", "content": USER_TMPL.format(title=..., abstract=..., body=...)},
    ],
    max_tokens=1400,
    temperature=0.2,
    top_p=0.9,
)
print(resp.choices[0].message.content)

Method: Gemini distillation

Prompt pool: 434 unique papers from PeerRead ICLR 2017 + OpenReview ICLR 2019-2021 with their system/user prompts.
Teacher generation: Gemini 3.1 Pro Preview produced one review per prompt on Vertex AI (temperature 0.3, thinking budget 1024). Total teacher tokens: ~6.5M in / ~1.6M out.
Filtering: keep only rows where the regex I recommend this paper for (ACCEPTANCE|REJECTION)\.? parses cleanly; dedupe by review-text hash. Yields 334 train + 100 val.
Student SFT: LoRA r=32, α=64 on Gemma 4 E4B via Unsloth; 5 epochs, cosine LR 2e-5, batch 8, bf16. Final train loss 2.628. Merged to 16-bit.

The student inherits the teacher's reviewing style and calibrated decision distribution — whereas training directly on human PeerRead reviews tends to collapse to "accept everything" because the test set is 84% accept.

Training Setup

Setting	Value
Base model	`unsloth/gemma-4-E4B-it` (4-bit via Unsloth)
Method	LoRA (r=32, α=64, dropout=0, all linear projections)
Teacher	`gemini-3.1-pro-preview` (Vertex AI)
Training examples	334 pairs (72.1% accept) + 100 val
Epochs	5
Steps	210
Effective batch	8 (per-device 1 × grad-accum 8)
Max seq len	8192
LR	2e-5 cosine, warmup 3%
Optimizer	AdamW 8-bit
Precision	bf16
Seed	42
Hardware	1 × RTX 4080 SUPER 32 GB
Wall-clock	95 min training + 13 min merge
Training loss	9.52 → 2.63

Artifacts

model.safetensors — merged 16-bit weights
adapter_model.safetensors — LoRA adapter (for applying on top of the base)
benchmark/peerread_ft_full.jsonl — per-paper generations on the 95-paper test set
benchmark/peerread_ft_metrics.json — aggregated metrics
benchmark/peerread_base_full.jsonl — base-model predictions on the same test set
benchmark/peerread_base_metrics.json — base-model metrics

Known limitations

Benchmark is skewed toward acceptance (84.2%), so a model that always accepts scores a deceptively high F1. The calibration plot is the more robust signal.
Training prompts are dominated by ICLR; performance on non-ICLR venues (ACL, CoNLL) is lower.
Teacher and student share the Gemma/Gemini SentencePiece tokenizer family but are different architectures, so this is soft-label-free SFT distillation (text-level), not logit-level KD.

License

Gemma license, inherited from the base model.

Downloads last month: 25

Model tree for NhatCuong22/gemma-4-e4b-scientific-reviewer

Base model

google/gemma-4-E4B

Finetuned

google/gemma-4-E4B-it

Finetuned

unsloth/gemma-4-E4B-it

Finetuned

(69)

this model