SoFA: Shielded On-the-fly Alignment via Priority Rule Following Paper • 2402.17358 • Published Feb 27, 2024 • 1
Scalable Oversight for Superhuman AI via Recursive Self-Critiquing Paper • 2502.04675 • Published Feb 7, 2025 • 1
On-Policy Self-Alignment with Fine-grained Knowledge Feedback for Hallucination Mitigation Paper • 2406.12221 • Published Jun 18, 2024
Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree? Paper • 2410.05584 • Published Oct 8, 2024
LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents Paper • 2605.29559 • Published 6 days ago • 14
LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents Paper • 2605.29559 • Published 6 days ago • 14
LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents Paper • 2605.29559 • Published 6 days ago • 14
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces Paper • 2601.11868 • Published Jan 17 • 37
view article Article Announcing LiteCoder-Terminal: Lightweight Terminal Agents with <1k Synthesized Trajectories Lite-Coder • Dec 18, 2025 • 9
GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks Paper • 2504.12764 • Published Apr 17, 2025 • 42
Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models Paper • 2501.01830 • Published Jan 3, 2025 • 17
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering Paper • 2411.11504 • Published Nov 18, 2024 • 24