Instructions to use KDEGroup/UI-AGILE-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use KDEGroup/UI-AGILE-3B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="KDEGroup/UI-AGILE-3B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("KDEGroup/UI-AGILE-3B")
model = AutoModelForImageTextToText.from_pretrained("KDEGroup/UI-AGILE-3B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use KDEGroup/UI-AGILE-3B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "KDEGroup/UI-AGILE-3B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "KDEGroup/UI-AGILE-3B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/KDEGroup/UI-AGILE-3B

SGLang

How to use KDEGroup/UI-AGILE-3B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "KDEGroup/UI-AGILE-3B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "KDEGroup/UI-AGILE-3B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "KDEGroup/UI-AGILE-3B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "KDEGroup/UI-AGILE-3B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use KDEGroup/UI-AGILE-3B with Docker Model Runner:
```
docker model run hf.co/KDEGroup/UI-AGILE-3B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

[📖 Paper] [🤗 Checkpoints] [🤗 Data] [🤗 Daily Paper] [🚀 Github]

🔥 Overview

UI-AGILE is a framework designed to enhance Graphical User Interface (GUI) agents at both training and inference stages. It addresses common challenges in Multimodal Large Language Models (MLLMs) such as reasoning designs, ineffective rewards, and visual noise.

Key Features

Training Enhancements:
- Continuous Reward Function: Incentivizes high-precision grounding.
- "Simple Thinking" Reward: Balances planning depth with execution speed and grounding accuracy.
- Cropping-based Resampling: Mitigates the sparse reward problem and improves learning on complex tasks.
Inference Enhancements:
- Decomposed Grounding with Selection: Dramatically improves grounding accuracy on high-resolution displays by breaking the image into smaller, manageable parts.

UI-AGILE-7B achieves state-of-the-art grounding performance on benchmarks like ScreenSpot-Pro and ScreenSpot-v2 while maintaining strong general agent capabilities.

⭐️ Citation

If you find this project useful, please cite:

@misc{lian2025uiagileadvancingguiagents,
      title={UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding}, 
      author={Shuquan Lian and Yuhang Wu and Jia Ma and Zihan Song and Bingqi Chen and Xiawu Zheng and Hui Li},
      year={2025},
      eprint={2507.22025},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2507.22025}, 
}

Downloads last month: 18

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for KDEGroup/UI-AGILE-3B

Quantizations

2 models

Paper for KDEGroup/UI-AGILE-3B

UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

Paper • 2507.22025 • Published Jul 29, 2025 • 4

Evaluation results

Overall on likaixin/ScreenSpot-Pro View evaluation results

source leaderboard

45
Android Studio Macos on likaixin/ScreenSpot-Pro View evaluation results

source leaderboard

42.5
Autocad Windows on likaixin/ScreenSpot-Pro View evaluation results

source leaderboard

29.4
Blender Windows on likaixin/ScreenSpot-Pro View evaluation results

source leaderboard

36.6
Davinci Macos on likaixin/ScreenSpot-Pro View evaluation results

source leaderboard

45.5
Eviews Windows on likaixin/ScreenSpot-Pro View evaluation results

source leaderboard

86
Excel Macos on likaixin/ScreenSpot-Pro View evaluation results

source leaderboard

46.9
Fruitloops Windows on likaixin/ScreenSpot-Pro View evaluation results

source leaderboard

28.1