FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol Paper • 2603.24943 • Published 10 days ago • 12
RealMaster: Lifting Rendered Scenes into Photorealistic Video Paper • 2603.23462 • Published 11 days ago • 32
RealRestorer: Towards Generalizable Real-World Image Restoration with Large-Scale Image Editing Models Paper • 2603.25502 • Published 9 days ago • 55
SIMART: Decomposing Monolithic Meshes into Sim-ready Articulated Assets via MLLM Paper • 2603.23386 • Published 11 days ago • 40
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning Paper • 2603.23483 • Published 11 days ago • 60
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence Paper • 2603.13398 • Published 24 days ago • 152
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance Paper • 2305.05176 • Published May 9, 2023 • 7
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22, 2024 • 259
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published Apr 3, 2024 • 74
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward Paper • 2404.01258 • Published Apr 1, 2024 • 12
Improving Text-to-Image Consistency via Automatic Prompt Optimization Paper • 2403.17804 • Published Mar 26, 2024 • 19
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation Paper • 2403.16990 • Published Mar 25, 2024 • 25
WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs Paper • 2403.07944 • Published Mar 10, 2024 • 1
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition Paper • 2403.14148 • Published Mar 21, 2024 • 21
AnimateDiff-Lightning: Cross-Model Diffusion Distillation Paper • 2403.12706 • Published Mar 19, 2024 • 18