IndustryBench-MIPU: Benchmarking Multi-Image Attribute Value Extraction for Industrial Products Paper • 2606.14383 • Published 11 days ago • 4
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling Paper • 2606.18023 • Published 8 days ago • 202
Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks Paper • 2606.12344 • Published 14 days ago • 68
SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories Paper • 2606.01311 • Published 24 days ago • 37
ButterChicken98/sd15_cottonweed15_rag_k2_qwen_recipe_bs48_snr5_noise01_10k Text-to-Image • Updated 20 days ago • 16 • 1
Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players Paper • 2605.28816 • Published 28 days ago • 431
Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality? Paper • 2605.22109 • Published May 21 • 171
SEIF: Self-Evolving Reinforcement Learning for Instruction Following Paper • 2605.07465 • Published May 8 • 30
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents Paper • 2604.07429 • Published Apr 8 • 123
ClawKeeper: Comprehensive Safety Protection for OpenClaw Agents Through Skills, Plugins, and Watchers Paper • 2603.24414 • Published Mar 25 • 183
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling Paper • 2603.25746 • Published Mar 26 • 155
SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models Paper • 2603.16859 • Published Mar 17 • 249