BenchPreS: A Benchmark for Context-Aware Personalized Preference Selectivity of Persistent-Memory LLMs Paper • 2603.16557 • Published 5 days ago • 20
Safe and Scalable Web Agent Learning via Recreated Websites Paper • 2603.10505 • Published 12 days ago • 25
Can Large Language Models Keep Up? Benchmarking Online Adaptation to Continual Knowledge Streams Paper • 2603.07392 • Published 15 days ago • 17
Privasis: Synthesizing the Largest "Public" Private Dataset from Scratch Paper • 2602.03183 • Published Feb 3 • 11
Lost in the Noise: How Reasoning Models Fail with Contextual Distractors Paper • 2601.07226 • Published Jan 12 • 33
K-EXAONE Collection First journey to foundation models with frontier-level performance. • 4 items • Updated Jan 9 • 35
Doc-PP: Document Policy Preservation Benchmark for Large Vision-Language Models Paper • 2601.03926 • Published Jan 7 • 1
Doc-PP: Document Policy Preservation Benchmark for Large Vision-Language Models Paper • 2601.03926 • Published Jan 7 • 1
Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought Paper • 2510.04230 • Published Oct 5, 2025 • 27
Probing-RAG: Self-Probing to Guide Language Models in Selective Document Retrieval Paper • 2410.13339 • Published Oct 17, 2024
How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads Paper • 2505.15865 • Published May 21, 2025 • 5
ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents Paper • 2509.22830 • Published Sep 26, 2025 • 5
ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents Paper • 2509.22830 • Published Sep 26, 2025 • 5