arxiv:2603.04304

V_1: Unifying Generation and Self-Verification for Parallel Reasoners

Published on Mar 4

· Submitted by

Harman Singh on Mar 5

UC Berkeley

Upvote

Authors:

Abstract

Test-time scaling methods for complex reasoning tasks are enhanced through unified generation and verification frameworks that leverage pairwise self-verification, achieving improved performance in code generation and mathematical reasoning benchmarks.

AI-generated summary

Test-time scaling for complex reasoning tasks shows that leveraging inference-time compute, by methods such as independently sampling and aggregating multiple solutions, results in significantly better task outcomes. However, a critical bottleneck is verification: sampling is only effective if correct solutions can be reliably identified among candidates. While existing approaches typically evaluate candidates independently via scalar scoring, we demonstrate that models are substantially stronger at pairwise self-verification. Leveraging this insight, we introduce V_1, a framework that unifies generation and verification through efficient pairwise ranking. V_1 comprises two components: V_1-Infer, an uncertainty-guided algorithm using a tournament-based ranking that dynamically allocates self-verification compute to candidate pairs whose relative correctness is most uncertain; and V_1-PairRL, an RL framework that jointly trains a single model as both generator and pairwise self-verifier, ensuring the verifier adapts to the generator's evolving distribution. On code generation (LiveCodeBench, CodeContests, SWE-Bench) and math reasoning (AIME, HMMT) benchmarks, V_1-Infer improves Pass@1 by up to 10% over pointwise verification and outperforms recent test-time scaling methods while being significantly more efficient. Furthermore, V_1-PairRL achieves 7--9% test-time scaling gains over standard RL and pointwise joint training, and improves base Pass@1 by up to 8.7% over standard RL in a code-generation setting.

View arXiv page View PDF Project page GitHub 11 Add to collection

Community

harman

Paper submitter 1 day ago

Can LLMs self-verify their own solutions?

Modern LLMs are increasingly used as parallel reasoners, where multiple candidate solutions are sampled and then filtered or aggregated. In this setting, the key bottleneck is often not generation but verification—identifying which candidate is actually correct.

In this work, we show that pairwise self-verification is a surprisingly strong primitive. Instead of scoring candidates independently, models compare solutions against each other, which provides a much stronger verification signal.

We introduce V1, a framework that unifies generation and self-verification:

• V1-Infer: an efficient Swiss-system style tournament that ranks candidate solutions through pairwise comparisons while focusing comparisons on uncertain pairs.
• V1-PairRL: a reinforcement learning framework that co-trains a single model as both a generator and a pairwise self-verifier, allowing generation and verification to improve together.

Across code and math benchmarks we observe substantial improvements in test-time scaling (up to +19.6% Pass@1 on code and +17.9% on math). We also show that pairwise verification integrates well and improves aggregation-based methods such as Recursive Self-Aggregation, improving convergence and latency.