Papers
arxiv:2602.14492

Query as Anchor: Scenario-Adaptive User Representation via Large Language Model

Published on Feb 16
ยท Submitted by
Jiahao Yuan
on Feb 17
Authors:
,
,
,
,
,
,
,
,
,
,

Abstract

A novel framework called Query-as-Anchor is introduced that transforms user modeling from static encoding to dynamic, query-aware synthesis using large language models with specialized architectures and training methods.

AI-generated summary

Industrial-scale user representation learning requires balancing robust universality with acute task-sensitivity. However, existing paradigms primarily yield static, task-agnostic embeddings that struggle to reconcile the divergent requirements of downstream scenarios within unified vector spaces. Furthermore, heterogeneous multi-source data introduces inherent noise and modality conflicts, degrading representation. We propose Query-as-Anchor, a framework shifting user modeling from static encoding to dynamic, query-aware synthesis. To empower Large Language Models (LLMs) with deep user understanding, we first construct UserU, an industrial-scale pre-training dataset that aligns multi-modal behavioral sequences with user understanding semantics, and our Q-Anchor Embedding architecture integrates hierarchical coarse-to-fine encoders into dual-tower LLMs via joint contrastive-autoregressive optimization for query-aware user representation. To bridge the gap between general pre-training and specialized business logic, we further introduce Cluster-based Soft Prompt Tuning to enforce discriminative latent structures, effectively aligning model attention with scenario-specific modalities. For deployment, anchoring queries at sequence termini enables KV-cache-accelerated inference with negligible incremental latency. Evaluations on 10 Alipay industrial benchmarks show consistent SOTA performance, strong scalability, and efficient deployment. Large-scale online A/B testing in Alipay's production system across two real-world scenarios further validates its practical effectiveness. Our code is prepared for public release and will be available at: https://github.com/JhCircle/Q-Anchor.

Community

Paper author Paper submitter

Query as Anchor: Scenario-Adaptive User Representation via

Large Language Model

Q-Anchor is a query-conditioned user representation framework that transforms static user embeddings into dynamic, scenario-adaptive representations using Large Language Models (LLMs).

Instead of producing fixed task-agnostic embeddings, Q-Anchor introduces Query-as-Anchor, a mechanism that re-anchors the same user behavior profile under different downstream objectives via natural language queries. This enables a single model to serve multiple business scenarios without retraining.

๐Ÿ”‘ Key Features

  • Dynamic Query-Aware Embeddings
    Generates scenario-specific user representations conditioned on natural language queries.

  • Hierarchical Multi-Modal Encoder
    Integrates heterogeneous behavioral logs (transactions, app usage, search, navigation, tabular features) into a coarse-to-fine structure aligned with LLM latent space.

  • UserU Pretraining Dataset (100M+ samples)
    Combines:

    • Future behavior prediction supervision
    • Reflection-verified LLM-synthesized user QA pairs
      to inject temporal dynamics and semantic understanding.
  • Joint Contrastive + Generative Training
    Aligns user embeddings with semantic targets while preserving token-level grounding.

  • Lightweight Soft Prompt Tuning
    Enables efficient scenario specialization without modifying backbone weights.

  • KV-Cache Optimized Inference
    User prefixes are encoded once and reused across multiple queries, enabling low-latency multi-scenario deployment.

๐Ÿ“Š Performance

Evaluated on 10 large-scale industrial benchmarks (Engagement, Risk, Marketing):

  • SOTA AUC & KS across all domains
  • +9.8% AUC improvement over strong general embedding baselines
  • Consistent gains validated via large-scale online A/B testing

Q-Anchor bridges the gap between sparse behavioral logs and LLM-level semantic understanding, enabling scalable, interpretable, and transferable user embeddings for industrial applications.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.14492 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.14492 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.14492 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.