AbhilekhMeda
/

qwen3-1.7b-math-prm

process-reward-model

Model card Files Files and versions

Qwen3-1.7B Math PRM

Training recipe:

Base model: Qwen/Qwen3-1.7B
Training data: raw step-level labels from Mai0313/prm800k (prm800k/data/phase2_train.jsonl)
Evaluation: Qwen/ProcessBench
Format: Qwen PRM-style <extra_0> marker after each reasoning step, score at marker positions

This repo contains the training script for a discriminative process reward model that scores individual reasoning steps.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AbhilekhMeda/qwen3-1.7b-math-prm

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Finetuned

(762)

this model