Sangsang/feedback_asymmetric_fixed_ema_Llama-3.1-8B-Instruct_bw0p5_fw0p5_ema0p999_ep30_v2 Text Generation • Updated Apr 11 • 5
Sangsang/grpo_Qwen3-1.7B_bs16_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • Updated Apr 5 • 1
Sangsang/grpo_Qwen3-4B_bs16_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • Updated Apr 4 • 1 • 1
Sangsang/feedback_asymmetric_fixed_ema_DeepSeek-R1-Distill-Qwen-7B_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated Apr 3 • 1
Sangsang/feedback_asymmetric_fixed_ema_DeepSeek-R1-Distill-Llama-8B_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated Apr 3 • 1
Sangsang/feedback_disallowed_ema_Qwen3-4B_reverse_kl_ema0p999_ep30 Text Generation • Updated Apr 2 • 1
Sangsang/feedback_asymmetric_fixed_ema_Qwen3-4B_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated Apr 2 • 1
Sangsang/grpo_Qwen3-4B-Instruct-2507_bs16_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • Updated Apr 2 • 1
Sangsang/feedback_asymmetric_fixed_ema_Qwen3-4B-Instruct-2507_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated Apr 2 • 1
Sangsang/feedback_disallowed_ema_Qwen3-4B-Instruct-2507_reverse_kl_ema0p999_ep30 Text Generation • Updated Apr 2 • 1
Sangsang/feedback_both_ema_Qwen3-4B-Instruct-2507_reverse_kl_ema0p999_ep30 Text Generation • Updated Apr 2 • 1
Sangsang/feedback_allowed_ema_Qwen3-4B-Instruct-2507_reverse_kl_ema0p999_ep30 Text Generation • Updated Apr 2 • 1
Sangsang/feedback_asymmetric_fixed_ema_Qwen2.5-7B-Instruct_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated Apr 2 • 3
Sangsang/feedback_disallowed_ema_Qwen2.5-7B-Instruct_reverse_kl_ema0p999_ep30 Text Generation • Updated Apr 2 • 3
Sangsang/feedback_both_ema_Qwen2.5-7B-Instruct_reverse_kl_ema0p999_ep30 Text Generation • Updated Apr 2 • 3
Sangsang/feedback_allowed_ema_Qwen2.5-7B-Instruct_reverse_kl_ema0p999_ep30 Text Generation • Updated Apr 2 • 3
Sangsang/feedback_asymmetric_fixed_ema_Llama-3.1-8B-Instruct_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated Apr 1 • 6
Sangsang/feedback_both_ema_Llama-3.1-8B-Instruct_reverse_kl_ema0p999_ep30 Text Generation • Updated Apr 1 • 6
Sangsang/feedback_disallowed_ema_Llama-3.1-8B-Instruct_reverse_kl_ema0p999_ep30 Text Generation • Updated Apr 1 • 6
Sangsang/feedback_allowed_ema_Llama-3.1-8B-Instruct_reverse_kl_ema0p999_ep30 Text Generation • Updated Apr 1 • 6