HuggingFaceTB/SmolVLM2-500M-Video-Instruct Image-Text-to-Text • 0.5B • Updated Apr 8, 2025 • 88.9k • 116
microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition • 6B • Updated Dec 10, 2025 • 179k • 1.56k