Model Card: a2ui_qwen3_30b_grpo_0410_v1

1. Overview

This model version is a GRPO-trained A2UI model initialized from the 30B multilingual DPO 0410 v2 checkpoint. It is optimized on annotated 0409 langfuse RL data with a GPT-5.4 judge and finalized as the fixed cookbook adapter released to Hugging Face.

2. Model Metadata

Field	Value
model_version_id	`a2ui_qwen3_30b_grpo_0410_v1`
algorithm	`grpo`
base_model	`Qwen/Qwen3-30B-A3B-Instruct-2507`
precision	`bf16`

3. Lineage

Field	Value
parent_model_version_id	`a2ui_qwen3_30b_dpo_0410_v2`

4. Training Data

Field	Value
dataset_version_id	`a2ui_data_grpo_0409_v1`

5. Training Configuration

Field	Value
epochs_or_steps	`steps=85`
batch_size	`32`
group_size	`8`
lr	`3e-5`
max_ctx	`8192`
max_resp	`4096`
lora_rank	`16`

6. Change Summary

Warm-start from 30B multilingual DPO v11_2 pretty-json checkpoint and optimize on annotated0409 langfuse RL data with GPT-5.4 judge; final fixed cookbook adapter uploaded to Fancylalala/a2ui_qwen3_30b_grpo_0410_v1.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Fancylalala/a2ui_qwen3_30b_grpo_0410_v1

Base model

Qwen/Qwen3-30B-A3B-Instruct-2507

Finetuned

(78)

this model