Model Card: a2ui_qwen3_30b_grpo_0410_v1

1. Overview

This model version is a GRPO-trained A2UI model initialized from the 30B multilingual DPO 0410 v2 checkpoint. It is optimized on annotated 0409 langfuse RL data with a GPT-5.4 judge and finalized as the fixed cookbook adapter released to Hugging Face.

2. Model Metadata

Field Value
model_version_id a2ui_qwen3_30b_grpo_0410_v1
algorithm grpo
base_model Qwen/Qwen3-30B-A3B-Instruct-2507
precision bf16

3. Lineage

Field Value
parent_model_version_id a2ui_qwen3_30b_dpo_0410_v2

4. Training Data

Field Value
dataset_version_id a2ui_data_grpo_0409_v1

5. Training Configuration

Field Value
epochs_or_steps steps=85
batch_size 32
group_size 8
lr 3e-5
max_ctx 8192
max_resp 4096
lora_rank 16

6. Change Summary

Warm-start from 30B multilingual DPO v11_2 pretty-json checkpoint and optimize on annotated0409 langfuse RL data with GPT-5.4 judge; final fixed cookbook adapter uploaded to Fancylalala/a2ui_qwen3_30b_grpo_0410_v1.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Fancylalala/a2ui_qwen3_30b_grpo_0410_v1

Finetuned
(78)
this model