🚀 Gemma-4-E4B-it-PARL (Autonomous Research Agent)

This model is a highly optimized version of google/gemma-4-E4B-it, fine-tuned specifically for Autonomous Multi-Hop Reasoning and Deep Web Research. It was developed as part of a Hackathon hosted by lablab.ai and sponsored by AMD.

🧠 Model Description

We utilized Generative Reward Policy Optimization (GRPO) and a Parallel-Agent Reinforcement Learning (PARL) architecture to transform the base Gemma-4 model into an autonomous agent capable of solving complex, multi-step tasks.

Developed by: Pimnara Adulchantarasorn, Phanida Toaluea, Nattanant Vonghan, Rapeepong
Base Model: google/gemma-4-E4B-it (Multimodal)
Training Infrastructure: AMD MI300X (192GB VRAM) via AMD Developer Cloud
License: Gemma License

⚡ Key Technical Highlights

Long-Context Fine-Tuning (60k+ Tokens): The model is trained to process and retain massive amounts of information retrieved from live web scraping without losing context.
PARL (Parallel-Agent Reinforcement Learning): Trained to orchestrate hierarchical agent workflows, allowing it to delegate tasks, execute Python code, and synthesize findings into comprehensive HTML reports.
Multimodal Preservation: During the GRPO training pipeline, the native Vision Encoder was intentionally frozen. This ensures the agent retains its full vision-language capabilities while its text-reasoning skills are aggressively optimized.
High-Throughput RL: Leveraged the massive 192GB VRAM of the AMD MI300X to scale parallel generation rollouts to K=16, significantly accelerating reward convergence.

💻 How to Use

Because this model preserves Gemma-4's multimodal architecture, you must use AutoProcessor alongside AutoModelForCausalLM.

import torch
from transformers import AutoProcessor, AutoModelForCausalLM

model_id = "Phonsiri/Gemma-4-E4B-it-PARL"

# Load the processor and the model
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# Example Chat Template with Native Thinking Enabled
messages = [
    {"role": "system", "content": "<|think|> You are a highly capable autonomous research agent."},
    {"role": "user", "content": "Write a detailed report on the evolution of AMD's ROCm ecosystem."}
]

text = processor.apply_chat_template(
    messages, 
    tokenize=False, 
    add_generation_prompt=True, 
    enable_thinking=True
)

inputs = processor(text=text, return_tensors="pt").to(model.device)

# Generate response
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.7)

input_len = inputs["input_ids"].shape[-1]
response = processor.decode(outputs[0][input_len:], skip_special_tokens=False)

print(response)

Downloads last month: 253

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for Phonsiri/Gemma-4-E4B-it-PARL

Base model

google/gemma-4-E4B

Finetuned

google/gemma-4-E4B-it

Finetuned

(165)

this model