Model Card: a2ui_qwen3_30b_grpo_0410_v1
1. Overview
This model version is a GRPO-trained A2UI model initialized from the 30B multilingual DPO 0410 v2 checkpoint. It is optimized on annotated 0409 langfuse RL data with a GPT-5.4 judge and finalized as the fixed cookbook adapter released to Hugging Face.
2. Model Metadata
| Field | Value |
|---|---|
| model_version_id | a2ui_qwen3_30b_grpo_0410_v1 |
| algorithm | grpo |
| base_model | Qwen/Qwen3-30B-A3B-Instruct-2507 |
| precision | bf16 |
3. Lineage
| Field | Value |
|---|---|
| parent_model_version_id | a2ui_qwen3_30b_dpo_0410_v2 |
4. Training Data
| Field | Value |
|---|---|
| dataset_version_id | a2ui_data_grpo_0409_v1 |
5. Training Configuration
| Field | Value |
|---|---|
| epochs_or_steps | steps=85 |
| batch_size | 32 |
| group_size | 8 |
| lr | 3e-5 |
| max_ctx | 8192 |
| max_resp | 4096 |
| lora_rank | 16 |
6. Change Summary
Warm-start from 30B multilingual DPO v11_2 pretty-json checkpoint and optimize on annotated0409 langfuse RL data with GPT-5.4 judge; final fixed cookbook adapter uploaded to Fancylalala/a2ui_qwen3_30b_grpo_0410_v1.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for Fancylalala/a2ui_qwen3_30b_grpo_0410_v1
Base model
Qwen/Qwen3-30B-A3B-Instruct-2507