arxiv:2603.26233

Ask or Assume? Uncertainty-Aware Clarification-Seeking in Coding Agents

Published on Mar 27

· Submitted by

Nicholas Edwards on Apr 3

University of Vienna

Upvote

Authors:

Sebastian Schuster

Abstract

A multi-agent system using uncertainty-aware design improves LLM agent performance on underspecified software development tasks by detecting ambiguity and proactively seeking clarification.

AI-generated summary

As Large Language Model (LLM) agents are increasingly deployed in open-ended domains like software engineering, they frequently encounter underspecified instructions that lack crucial context. While human developers naturally resolve underspecification by asking clarifying questions, current agents are largely optimized for autonomous execution. In this work, we systematically evaluate the clarification-seeking abilities of LLM agents on an underspecified variant of SWE-bench Verified. We propose an uncertainty-aware multi-agent scaffold that explicitly decouples underspecification detection from code execution. Our results demonstrate that this multi-agent system using OpenHands + Claude Sonnet 4.5 achieves a 69.40% task resolve rate, significantly outperforming a standard single-agent setup (61.20%) and closing the performance gap with agents operating on fully specified instructions. Furthermore, we find that the multi-agent system exhibits well-calibrated uncertainty, conserving queries on simple tasks while proactively seeking information on more complex issues. These findings indicate that current models can be turned into proactive collaborators, where agents independently recognize when to ask questions to elicit missing information in real-world, underspecified tasks.

View arXiv page View PDF GitHub 2 Add to collection

Community

nedwards99

Paper submitter about 10 hours ago

We investigate whether LLM agents can independently decide when to ask clarifying questions on underspecified coding tasks. Our uncertainty-aware multi-agent scaffold (UA-Multi) achieves 69.40% on an underspecified variant of SWE-bench Verified — nearly matching an agent given the fully specified issue (70.80%). The system is also well-calibrated: it asks more on harder tasks and refrains on easier ones where it already has enough context to proceed.

Code: https://github.com/nedwards99/ask-or-assume

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2603.26233

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.26233 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.26233 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.26233 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.