Beyond Pixels: Visual Metaphor Transfer via Schema-Driven Agentic Reasoning
Abstract
Visual metaphor transfer enables creative AI systems to decompose abstract conceptual relationships from reference images and reapply them to new subjects through a multi-agent framework grounded in cognitive theory.
A visual metaphor constitutes a high-order form of human creativity, employing cross-domain semantic fusion to transform abstract concepts into impactful visual rhetoric. Despite the remarkable progress of generative AI, existing models remain largely confined to pixel-level instruction alignment and surface-level appearance preservation, failing to capture the underlying abstract logic necessary for genuine metaphorical generation. To bridge this gap, we introduce the task of Visual Metaphor Transfer (VMT), which challenges models to autonomously decouple the "creative essence" from a reference image and re-materialize that abstract logic onto a user-specified target subject. We propose a cognitive-inspired, multi-agent framework that operationalizes Conceptual Blending Theory (CBT) through a novel Schema Grammar ("G"). This structured representation decouples relational invariants from specific visual entities, providing a rigorous foundation for cross-domain logic re-instantiation. Our pipeline executes VMT through a collaborative system of specialized agents: a perception agent that distills the reference into a schema, a transfer agent that maintains generic space invariance to discover apt carriers, a generation agent for high-fidelity synthesis and a hierarchical diagnostic agent that mimics a professional critic, performing closed-loop backtracking to identify and rectify errors across abstract logic, component selection, and prompt encoding. Extensive experiments and human evaluations demonstrate that our method significantly outperforms SOTA baselines in metaphor consistency, analogy appropriateness, and visual creativity, paving the way for automated high-impact creative applications in advertising and media. Source code will be made publicly available.
Community
This paper introduces Visual Metaphor Transfer (VMT), a new task that goes beyond pixel-level editing to model abstract, cross-domain creative logic in visual generation. Inspired by Conceptual Blending Theory, the authors propose a schema-based, multi-agent framework that explicitly decouples metaphorical essence from visual appearance and re-instantiates it on new subjects. Extensive human studies show clear gains in metaphor consistency and creative quality, highlighting strong potential for high-impact applications in advertising and media.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Unified Thinker: A General Reasoning Modular Core for Image Generation (2026)
- Reasoning-Driven Amodal Completion: Collaborative Agents and Perceptual Evaluation (2025)
- Agentic Retoucher for Text-To-Image Generation (2026)
- A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning (2025)
- CoAgent: Collaborative Planning and Consistency Agent for Coherent Video Generation (2025)
- Re-Align: Structured Reasoning-guided Alignment for In-Context Image Generation and Editing (2026)
- AgentComp: From Agentic Reasoning to Compositional Mastery in Text-to-Image Models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper