Runtime error Agents 72 VLM R1 Referral Expression 💬 72 Mark regions in images based on text descriptions
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper • 2412.04424 • Published Dec 5, 2024 • 63
view article Article ColFlor: Towards BERT-Size Vision-Language Document Retrieval Models ahmed-masry • Oct 18, 2024 • 21
DocLayout-YOLO Collection Dataset and model for DocLayout-YOLO • 10 items • Updated Jan 14, 2025 • 21