Benchmark data in "Beyond Ideal Instruction: A Comprehensive Framework for Evaluating LLMs in Realistic Interactions".
Xuan Yang
TorresYang
·
AI & ML interests
LLM reasoning, agent
Recent Activity
authored a paper 1 day ago
SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal
Foundation Models authored a paper 1 day ago
Beyond Ideal Instruction: A Comprehensive Framework for Evaluating LLMs in Realistic Interactions updated a collection 8 days ago
RUT-Bench