IntentRL-Ambig-Text2SQL-4B

This model is trained to handle ambiguous text-to-SQL requests by explicitly reasoning about user intent and producing multiple interpretation–answer pairs rather than silently committing to a single interpretation.

It is based on Qwen/Qwen3-4B-Instruct-2507, fine-tuned with RL (DAPO/GRPO) using a custom reward that encourages recall (covering more valid interpretations) for ambiguous questions and precision for unambiguous ones.

Example

Given a schema and an ambiguous question:

Schema: CREATE TABLE Jobs (JobID INTEGER PRIMARY KEY, Min_Years INTEGER, Pref_Years INTEGER, Position TEXT, Salary REAL);

Question: Show the required experience for the best-paid role.

The model produces multiple interpretation–answer pairs:

Minimum years of experience required → SELECT Min_Years ...
Preferred years of experience → SELECT Pref_Years ...
Both minimum and preferred years → SELECT Min_Years, Pref_Years ...

Paper

Reasoning about Intent for Ambiguous Requests

Authors: Irina Saparina, Mirella Lapata

Training Details

Base model: Qwen3-4B-Instruct-2507
Method: RL with DAPO/GRPO and a custom recall/precision reward
Training data: Ambrosia text-to-SQL benchmark
Ambiguous examples are upsampled to balance training

Code

Training and evaluation code: https://github.com/saparina/intentRL

Citation

@misc{saparina2025reasoningintentambiguousrequests,
      title={Reasoning about Intent for Ambiguous Requests},
      author={Irina Saparina and Mirella Lapata},
      year={2025},
      eprint={2511.10453},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2511.10453},
}