arxiv:2606.12476

Quickest Detection of Hallucination Onset: Delay Bounds and Learned CUSUM Statistics

Published on Jun 10

· Submitted by

Igor Itkin on Jun 15

Upvote

Authors:

Igor Itkin

Abstract

Token-level hallucination detection is reformulated as a quickest change detection problem, revealing fundamental limits on detection delay and demonstrating superior performance through causal recurrent modeling.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Token-level hallucination detectors are evaluated as classifiers, by AUC over all tokens, yet a streaming monitor is judged by its reaction time: the number of tokens that pass between the onset of a hallucination and the alarm. We formulate hallucination onset detection as a quickest change detection problem. A first-order Markov model of the latent faithful/hallucinated state, validated on RAGTruth, places the task inside classical change-point theory and yields Lorden's lower bound on detection delay: about 1.3 tokens at a false-alarm rate of 0.01. We then show that a causal recurrent labeler acts as a CUSUM with a learned increment; at a matched false-alarm rate it detects in 11-13 tokens, against 31 for a linear per-token baseline, and a controlled decomposition attributes most of this advantage to a better per-token score rather than to temporal accumulation. An information-rate optimality theorem of Donsker-Varadhan type explains the remaining order-of-magnitude gap: the learned score realizes only 1/4.5 of the divergence the features carry, a deficit that recalibration cannot remove, with the remainder a finite-horizon effect. Classification metrics conceal this delay structure; sequential analysis makes it measurable

View arXiv page View PDF GitHub 0 Add to collection

Community

BukaByaka

Paper author Paper submitter 4 days ago

Token-level hallucination detectors are evaluated as classifiers, by AUC over all tokens, yet a streaming monitor is judged by its reaction time: the number of tokens that pass between the onset of a hallucination and the alarm. We formulate hallucination onset detection as a quickest change detection problem. A first-order Markov model of the latent faithful/hallucinated state, validated on RAGTruth, places the task inside classical change-point theory and yields Lorden's lower bound on detection delay: about 1.3 tokens at a false-alarm rate of 0.01. We then show that a causal recurrent labeler acts as a CUSUM with a learned increment. Among the onsets it catches it detects in 11-13 tokens, against 31 for a linear per-token baseline, though at this false-alarm budget every detector catches under a third of onsets and the recall-honest delay is 56-66 tokens: low-false-alarm onset detection is hard. A controlled decomposition attributes the speed advantage mostly to a better per-token score rather than to temporal accumulation. An information-rate optimality theorem of Donsker-Varadhan type explains the remaining order-of-magnitude gap: the learned score realizes only 1/4.5 of the divergence the features carry, a deficit that recalibration cannot remove, with the remainder a finite-horizon effect. Classification metrics conceal this delay structure; sequential analysis makes it measurable.

noahml

4 days ago

This is a really interesting take on hallucination. Most papers just treat this as a static classification task, but framing it as a quickest change detection problem—like spotting a sensor shift—actually makes a lot of sense for streaming output.

I’m curious about that gap between the theoretical lower bound of 1.3 tokens and the 11-13 tokens the model achieves. Do you think that 1/4.5 divergence efficiency is something we can ever bridge with better training?

I made a podcast on it with ResearchPod, it makes it easy to get the key concepts on the go:
https://researchpod.app/episode/89f4a384-3eff-4449-8bac-1f99f307b3e5

BukaByaka

Paper author 2 days ago

•

edited 2 days ago

Thanks, Noah — this is exactly the question that kept me up, and I'm glad the QCD framing landed.
Short version: better training on the same features won't close it; better features will.

Here's how we read the gap. The 1.3-token floor is set by the KL divergence the features carry between the faithful and hallucinated regimes (≈3.5 nats). What a detector actually achieves is governed by the realized information rate of its score,
which we can measure directly — and the learned score recovers only about 1/4.5 of that divergence. That 4.5× is the multiplicative delay penalty. Two things about it:

It's a property of the score's shape, not its scale. I checked: it's invariant to recalibration and barely moves under monotone reshaping. So "train longer / calibrate better / add depth" doesn't touch it. In our information-rate theorem the
realized rate equals the KL only when the score is affine in the true log-likelihood ratio — ours isn't, and the deficit is close to irreducible for these features.
The rest of the gap — from ≈6 tokens (1.3 × 4.5) to the 11–13 we observe — is a finite-horizon effect: the increments are strongly correlated and detection fires faster than the score mixes, so the asymptotic rate overshoots.
So the lever isn't a bigger model, it's features that separate the two regimes more sharply (push those 3.5 nats up). The label oracle sits at ≈1.0 token (4.6 nats) — roughly where perfect separation would land. That's the ceiling worth chasing.

One honest caveat I keep in the paper: at this false-alarm budget every detector still catches under a third of onsets at the first token, so the recall-honest delay is much larger. Closing the 4.5× speeds up the onsets you do catch; it doesn't fix
the miss rate, which is a separate axis.

And thanks for making the ResearchPod episode — that's a generous thing to do. I'll give it a listen.

librarian-bot

3 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.12476

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.12476 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.12476 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.12476 in a Space README.md to link it from this page.