Qwen3-1.7B Math PRM

Training recipe:

  • Base model: Qwen/Qwen3-1.7B
  • Training data: raw step-level labels from Mai0313/prm800k (prm800k/data/phase2_train.jsonl)
  • Evaluation: Qwen/ProcessBench
  • Format: Qwen PRM-style <extra_0> marker after each reasoning step, score at marker positions

This repo contains the training script for a discriminative process reward model that scores individual reasoning steps.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AbhilekhMeda/qwen3-1.7b-math-prm

Finetuned
Qwen/Qwen3-1.7B
Finetuned
(762)
this model