GlyphPrinter: Region-Grouped Direct Preference Optimization for Glyph-Accurate Visual Text Rendering
Abstract
Generating accurate glyphs for visual text rendering is essential yet challenging. Existing methods typically enhance text rendering by training on a large amount of high-quality scene text images, but the limited coverage of glyph variations and excessive stylization often compromise glyph accuracy, especially for complex or out-of-domain characters. Some methods leverage reinforcement learning to alleviate this issue, yet their reward models usually depend on text recognition systems that are insensitive to fine-grained glyph errors, so images with incorrect glyphs may still receive high rewards. Inspired by Direct Preference Optimization (DPO), we propose GlyphPrinter, a preference-based text rendering method that eliminates reliance on explicit reward models. However, the standard DPO objective only models overall preference between two samples, which is insufficient for visual text rendering where glyph errors typically occur in localized regions. To address this issue, we construct the GlyphCorrector dataset with region-level glyph preference annotations and propose Region-Grouped DPO (R-GDPO), a region-based objective that optimizes inter- and intra-sample preferences over annotated regions, substantially enhancing glyph accuracy. Furthermore, we introduce Regional Reward Guidance, an inference strategy that samples from an optimal distribution with controllable glyph accuracy. Extensive experiments demonstrate that the proposed GlyphPrinter outperforms existing methods in glyph accuracy while maintaining a favorable balance between stylization and precision.
Community
GlyphPrinter is a preference-based text rendering framework designed to eliminate the reliance on explicit reward models for visual text generation. It addresses the common failure cases in existing T2I models, such as stroke distortions and incorrect glyphs, especially when rendering complex Chinese characters, multilingual text, or out-of-domain symbols.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering (2026)
- GlyphBanana: Advancing Precise Text Rendering Through Agentic Workflows (2026)
- WeEdit: A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing (2026)
- PosterOmni: Generalized Artistic Poster Creation via Task Distillation and Unified Reward Feedback (2026)
- Improving Text-to-Image Generation with Intrinsic Self-Confidence Rewards (2026)
- Enhancing Spatial Understanding in Image Generation via Reward Modeling (2026)
- Diff-Aid: Inference-time Adaptive Interaction Denoising for Rectified Text-to-Image Generation (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper