🚀 Gemma-4-E4B-it-PARL (Autonomous Research Agent)
This model is a highly optimized version of google/gemma-4-E4B-it, fine-tuned specifically for Autonomous Multi-Hop Reasoning and Deep Web Research. It was developed as part of a Hackathon hosted by lablab.ai and sponsored by AMD.
🧠 Model Description
We utilized Generative Reward Policy Optimization (GRPO) and a Parallel-Agent Reinforcement Learning (PARL) architecture to transform the base Gemma-4 model into an autonomous agent capable of solving complex, multi-step tasks.
- Developed by: Pimnara Adulchantarasorn, Phanida Toaluea, Nattanant Vonghan, Rapeepong
- Base Model:
google/gemma-4-E4B-it(Multimodal) - Training Infrastructure: AMD MI300X (192GB VRAM) via AMD Developer Cloud
- License: Gemma License
⚡ Key Technical Highlights
- Long-Context Fine-Tuning (60k+ Tokens): The model is trained to process and retain massive amounts of information retrieved from live web scraping without losing context.
- PARL (Parallel-Agent Reinforcement Learning): Trained to orchestrate hierarchical agent workflows, allowing it to delegate tasks, execute Python code, and synthesize findings into comprehensive HTML reports.
- Multimodal Preservation: During the GRPO training pipeline, the native Vision Encoder was intentionally frozen. This ensures the agent retains its full vision-language capabilities while its text-reasoning skills are aggressively optimized.
- High-Throughput RL: Leveraged the massive 192GB VRAM of the AMD MI300X to scale parallel generation rollouts to
K=16, significantly accelerating reward convergence.
💻 How to Use
Because this model preserves Gemma-4's multimodal architecture, you must use AutoProcessor alongside AutoModelForCausalLM.
import torch
from transformers import AutoProcessor, AutoModelForCausalLM
model_id = "Phonsiri/Gemma-4-E4B-it-PARL"
# Load the processor and the model
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
# Example Chat Template with Native Thinking Enabled
messages = [
{"role": "system", "content": "<|think|> You are a highly capable autonomous research agent."},
{"role": "user", "content": "Write a detailed report on the evolution of AMD's ROCm ecosystem."}
]
text = processor.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True
)
inputs = processor(text=text, return_tensors="pt").to(model.device)
# Generate response
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.7)
input_len = inputs["input_ids"].shape[-1]
response = processor.decode(outputs[0][input_len:], skip_special_tokens=False)
print(response)
- Downloads last month
- 253