arxiv:2606.16019

Scaling Human and G2P Supervision for Robust Phonetic Transcription

Published on Jun 14

Authors:

Abstract

Research reveals that Grapheme-to-Phoneme models provide limited benefits beyond 20-30 hours of human annotation, with ASR pretraining offering superior performance for robust phonetic transcription across diverse speech types.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Expert phonetic annotation is costly, especially for non-standard dialects and atypical speech. A common alternative is using Grapheme-to-Phoneme (G2P) models to auto-generate phonetic labels from text transcripts at scale. We study how automatic phonetic transcription performance scales with human and G2P supervision in English. Using a curated 80-hour benchmark spanning native, non-native and post-stroke speech, we identify a supervision quality threshold: G2P supervision helps only when fewer than 20-30 hours of human annotation are available. Beyond this threshold, it provides no significant benefit and can reduce cross-dialect robustness. What is effective after this threshold is ASR pretraining which we use to achieve a 2.3x reduction in weighted phone feature error rate over prior systems, with strong gains on non-native and aphasic speech. These results suggest that quantity-driven G2P scaling may yield diminishing returns for robust generalization.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.16019

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.16019 in a dataset README.md to link it from this page.

Scaling Human and G2P Supervision for Robust Phonetic Transcription

Abstract

Community

Models citing this paper 1

Datasets citing this paper 0

Spaces citing this paper 2

Collections including this paper 1