DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research Paper • 2511.19399 • Published Nov 24, 2025 • 60
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research Paper • 2511.19399 • Published Nov 24, 2025 • 60
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research Paper • 2511.19399 • Published Nov 24, 2025 • 60
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research Paper • 2511.19399 • Published Nov 24, 2025 • 60
RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments Paper • 2511.07317 • Published Nov 10, 2025 • 15
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research Paper • 2511.19399 • Published Nov 24, 2025 • 60
Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs in Multimodal LLMs Paper • 2510.18279 • Published Oct 21, 2025 • 4
Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation Paper • 2508.13144 • Published Aug 18, 2025
CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge Paper • 2404.06664 • Published Apr 10, 2024 • 1
CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs Paper • 2410.02677 • Published Oct 3, 2024 • 1
Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning Paper • 2502.14860 • Published Feb 20, 2025
Spurious Rewards: Rethinking Training Signals in RLVR Paper • 2506.10947 • Published Jun 12, 2025 • 2
MediQ: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning Paper • 2406.00922 • Published Jun 3, 2024