When an LLM is apprehensive about its answers -- and when its uncertainty is justified Paper • 2503.01688 • Published Mar 3, 2025 • 22
view article Article Preference Tuning LLMs with Direct Preference Optimization Methods +3 Jan 18, 2024 • 81
Reasoning Shift: How Context Silently Shortens LLM Reasoning Paper • 2604.01161 • Published 17 days ago • 31
view article Article ORBA: Orthogonal Reflection Bounded Ablation — A Geometrically Exact Detour in Directional Activation Editing 25 days ago • 6
view article Article Take Control of What Your LLM Knows and Does — with the EasyEdit Tool Series Jul 15, 2025 • 9
view article Article LateOn-Code & ColGrep: LightOn unveils state-of-the-art code retrieval models and code search tooling Feb 12 • 53
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark Paper • 2406.01574 • Published Jun 3, 2024 • 54
view article Article 🏟️ Smol AI WorldCup: A 5-Axis Benchmark That Reveals What Small Language Models Can Really Do Mar 10 • 38
Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines? Paper • 2602.14111 • Published Feb 15 • 56
Back to Basics: Revisiting Exploration in Reinforcement Learning for LLM Reasoning via Generative Probabilities Paper • 2602.05281 • Published Feb 5 • 14
Multimodal Evaluation of Russian-language Architectures Paper • 2511.15552 • Published Nov 19, 2025 • 79
GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms Paper • 2511.17592 • Published Nov 17, 2025 • 121
When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA Paper • 2510.04849 • Published Oct 6, 2025 • 117