merve PRO
AI & ML interests
Recent Activity
Organizations
-
Running21
YOLO26
💙21Process images with advanced object detection and segmentation
-
RunningFeatured53
YOLO26 WebGPU
🏆53Real-time object detection & pose estimation in your browser
-
onnx-community/yolo26x-ONNX
Updated • 526 • 5 -
openvision/yoloe26-n-seg
Zero-Shot Object Detection • Updated • 143 • 2
-
Wuli-art/Qwen-Image-2512-Turbo-LoRA
Text-to-Image • Updated • 30k • 192 -
miromind-ai/MiroThinker-v1.5-235B
Text Generation • 235B • Updated • 2.39k • 245 -
prithivMLmods/Qwen-Image-Edit-2511-Object-Remover
Image-to-Image • Updated • 4.64k • • 47 -
tencent/Youtu-LLM-2B-Base
Text Generation • Updated • 5.55k • 39
-
facebook/sam3
Mask Generation • 0.9B • Updated • 1.65M • 1.48k -
Running on ZeroFeatured101
SAM3 Video Segmentation
🐠101Track and label objects in videos using text prompts or clicks
-
onnx-community/sam3-tracker-ONNX
Mask Generation • Updated • 4.31k • 26 -
Running22
SAM3 Tracker WebGPU
🎯22Segment and extract parts from images by clicking
-
opendatalab/OmniDocBench
Viewer • Updated • 1.36k • 9.73k • 67 -
nanonets/Nanonets-OCR-s
Image-Text-to-Text • 4B • Updated • 30.7k • 1.58k -
echo840/MonkeyOCR
Image-Text-to-Text • Updated • 256 • 514 -
Running on ZeroMCPFeatured140
Multimodal OCR2
💻140nanonets ocr / smoldocling / monkey ocr / typhoon ocr
-
NVLM: Open Frontier-Class Multimodal LLMs
Paper • 2409.11402 • Published • 74 -
BRAVE: Broadening the visual encoding of vision-language models
Paper • 2404.07204 • Published • 19 -
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Paper • 2403.18814 • Published • 47 -
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Paper • 2409.17146 • Published • 121
-
Runtime errorFeatured100
LOTUS Normal
🌍100Generate high-quality predictions from images
-
Runtime error78
LOTUS Depth
🚀78Generate depth maps from images and videos
-
jingheya/lotus-depth-g-v1-0
Depth Estimation • Updated • 10.3k • 27 -
jingheya/lotus-depth-d-v1-0
Depth Estimation • Updated • 280 • 5
-
facebook/dinov2-large
Image Feature Extraction • 0.3B • Updated • 450k • 101 -
google/flan-t5-xl
3B • Updated • 131k • 526 -
google/siglip-large-patch16-384
Zero-Shot Image Classification • 0.7B • Updated • 18k • 11 -
google/vit-huge-patch14-224-in21k
Image Feature Extraction • 0.6B • Updated • 86.8k • 22
-
facebook/deit-base-distilled-patch16-384
Image Classification • 87.6M • Updated • 9.39k • • 7 -
facebook/convnextv2-base-1k-224
Image Classification • 88.7M • Updated • 1.46k • • 4 -
facebook/deit-base-distilled-patch16-224
Image Classification • Updated • 6.88k • • 31 -
google/vit-base-patch32-384
Image Classification • 88.3M • Updated • 18.2k • • 23
-
facebook/maskformer-swin-large-coco
Image Segmentation • 0.2B • Updated • 1.07k • • 27 -
nvidia/segformer-b0-finetuned-ade-512-512
Image Segmentation • 3.75M • Updated • 365k • • 179 -
facebook/detr-resnet-50-dc5-panoptic
Image Segmentation • 43M • Updated • 26 • 3 -
nvidia/segformer-b5-finetuned-cityscapes-1024-1024
Image Segmentation • Updated • 76.2k • • 37
-
Salesforce/blip-image-captioning-large
Image-to-Text • 0.5B • Updated • 1.22M • 1.44k -
Salesforce/blip-image-captioning-base
Image-to-Text • Updated • 2M • 840 -
microsoft/trocr-base-handwritten
Image-to-Text • 0.3B • Updated • 137k • 474 -
microsoft/git-large-coco
Image-to-Text • 0.4B • Updated • 1.64k • 104
-
Running112
Grounding DINO Demo
💻112Cutting edge open-vocabulary object detection app
-
RunningFeatured95
Owlv2
👀95State-of-the-art Zero-shot Object Detection
-
Runtime errorFeatured41
BLIP2 with transformers
🌖41BLIP2 (cutting edge image captioning) in 🤗transformers
-
Build errorFeatured378
IDEFICS Playground
🐨378
-
RunningFeatured95
Owlv2
👀95State-of-the-art Zero-shot Object Detection
-
Runtime errorFeatured64
Owl Tracking
⚡64Powerful foundation model for zero-shot object tracking
-
Sleeping26
Search and Detect (CLIP/OWL-ViT)
🦉26Search and detect objects in images using text queries
-
Running on ZeroFeatured109
OWLSAM
😻109State-of-the-art open-vocabulary image segmentation ⚡️
-
Runtime errorFeatured84
UDOP
🏃84Generate text from document images
-
Runtime error40
Pix2struct
📚40Play with all the pix2struct variants in this d
-
Sleeping26
Compare Docvqa Models
🦀26Compare different visual question answering
-
Runtime errorFeatured289
DocQuery — Document Query Engine
🦉289
-
Improved Baselines with Visual Instruction Tuning
Paper • 2310.03744 • Published • 39 -
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper • 2403.05525 • Published • 49 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 11 -
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Paper • 2404.01331 • Published • 27
-
google/owlvit-base-patch32
Zero-Shot Object Detection • 0.2B • Updated • 79.7k • 144 -
google/owlvit-base-patch16
Zero-Shot Object Detection • Updated • 5.81k • 13 -
google/owlvit-large-patch14
Zero-Shot Object Detection • Updated • 7.11k • 29 -
google/owlv2-base-patch16
Zero-Shot Object Detection • 0.2B • Updated • 19.1k • 29
-
Running192
Vidore Leaderboard
🥇192Browse and compare visual document retrieval models
-
Running on CPU Upgrade977
Open VLM Leaderboard
🌎977VLMEvalKit Evaluation Results Collection
-
RunningFeatured559
Vision Arena (Testing VLMs side-by-side)
🖼559Display image analysis results
-
RunningFeatured85
SEED-Bench Leaderboard
🏆85Submit model evaluation results to leaderboard
-
google/translategemma-27b-it
Image-Text-to-Text • 29B • Updated • 31.8k • 275 -
kakaocorp/kanana-2-30b-a3b-mid-2601
Text Generation • 31B • Updated • 124 • 30 -
black-forest-labs/FLUX.2-klein-base-4B
Image-to-Image • Updated • 19.7k • • 72 -
google/translategemma-12b-it
Image-Text-to-Text • 13B • Updated • 54.7k • 231
-
facebook/metaclip-2-worldwide-s16
Zero-Shot Image Classification • 0.4B • Updated • 38 • 8 -
facebook/metaclip-2-worldwide-m16
Zero-Shot Image Classification • 0.5B • Updated • 5 • 3 -
facebook/metaclip-2-worldwide-l14
Zero-Shot Image Classification • 1B • Updated • 118 • 12 -
facebook/metaclip-2-worldwide-b32
Zero-Shot Image Classification • 0.6B • Updated • 128 • 6
-
deepseek-ai/DeepSeek-V3-0324
Text Generation • Updated • 293k • • 3.09k -
Qwen/Qwen2.5-Omni-7B
Any-to-Any • 11B • Updated • 163k • 1.85k -
google/txgemma-27b-chat
Text Generation • 27B • Updated • 563 • 57 -
RunningFeatured365
Qwen2.5 Omni 7B Demo
🏆365Generate text and speech responses from text, audio, images, or video input
-
Running on Zero267
Qwen2-VL-7B
🔥267Generate text from an image and question
-
Running67
UI-TARS
🌖67Find click coordinates on images based on instructions
-
Running97
Qwen2.5-1M Demo
💻97Answer questions about uploaded documents
-
Qwen/Qwen2.5-14B-Instruct-1M
Text Generation • 15B • Updated • 4.7k • • 332
-
ibm-granite/granite-3.0-8b-instruct
Text Generation • 8B • Updated • 15.1k • 205 -
ibm-granite/granite-3.0-2b-instruct
Text Generation • 3B • Updated • 2.79k • 47 -
CohereLabs/aya-expanse-8b
Text Generation • 8B • Updated • 159k • 419 -
CohereLabs/aya-expanse-32b
Text Generation • 32B • Updated • 4.67k • • 286
-
microsoft/resnet-50
Image Classification • 25.6M • Updated • 137k • • 473 -
google/vit-base-patch16-224-in21k
Image Feature Extraction • 86.4M • Updated • 858k • 393 -
google/vit-base-patch32-224-in21k
Image Feature Extraction • 88M • Updated • 6.13k • 19 -
facebook/dinov2-large
Image Feature Extraction • 0.3B • Updated • 450k • 101
-
facebook/detr-resnet-50
Object Detection • 41.6M • Updated • 571k • • 925 -
facebook/detr-resnet-101-dc5
Object Detection • 60.7M • Updated • 1.69k • 19 -
facebook/detr-resnet-50-dc5
Object Detection • 41.6M • Updated • 1.5k • 6 -
google/owlvit-base-patch32
Zero-Shot Object Detection • 0.2B • Updated • 79.7k • 144
-
openai/clip-vit-large-patch14
Zero-Shot Image Classification • 0.4B • Updated • 8.16M • 1.95k -
openai/clip-vit-base-patch32
Zero-Shot Image Classification • Updated • 15.1M • 848 -
laion/CLIP-ViT-bigG-14-laion2B-39B-b160k
Zero-Shot Image Classification • Updated • 66.9k • 306 -
kakaobrain/align-base
Zero-Shot Image Classification • Updated • 11.2k • 29
-
microsoft/xclip-base-patch32
Video Classification • 0.2B • Updated • 183k • 108 -
facebook/timesformer-base-finetuned-k400
Video Classification • Updated • 27.3k • 42 -
facebook/timesformer-base-finetuned-k600
Video Classification • Updated • 1.17k • 12 -
google/vivit-b-16x2
Video Classification • Updated • 899 • 11
-
Running on ZeroFeatured72
Draw To Search Art
🐠72Draw/upload image and search among WikiART using SigLIP
-
Running on CPU Upgrade23
Compare Clip Siglip
🏃23Compare strong zero-shot image classification models
-
Running on Zero13
Multilingual Zero Shot Image Clf
🏢13Comparing powerful multilingual zero-shot image clf models
-
BAAI/bunny-phi-2-siglip-lora
Text Generation • Updated • 239 • 48
-
google/owlvit-base-patch32
Zero-Shot Object Detection • 0.2B • Updated • 79.7k • 144 -
google/owlvit-base-patch16
Zero-Shot Object Detection • Updated • 5.81k • 13 -
google/owlvit-large-patch14
Zero-Shot Object Detection • Updated • 7.11k • 29 -
google/owlv2-base-patch16
Zero-Shot Object Detection • 0.2B • Updated • 19.1k • 29
-
google/owlvit-base-patch32
Zero-Shot Object Detection • 0.2B • Updated • 79.7k • 144 -
google/owlvit-base-patch16
Zero-Shot Object Detection • Updated • 5.81k • 13 -
google/owlvit-large-patch14
Zero-Shot Object Detection • Updated • 7.11k • 29 -
google/owlv2-base-patch16
Zero-Shot Object Detection • 0.2B • Updated • 19.1k • 29
-
Paused21
Video Llava
🐨21Generate descriptions by uploading images or videos
-
llava-hf/LLaVA-NeXT-Video-7B-hf
Video-Text-to-Text • 7B • Updated • 57.9k • 122 -
llava-hf/LLaVA-NeXT-Video-7B-DPO-hf
Video-Text-to-Text • 7B • Updated • 1.03k • 11 -
llava-hf/LLaVA-NeXT-Video-7B-32K-hf
Image-Text-to-Text • 8B • Updated • 341 • 8
-
google/translategemma-27b-it
Image-Text-to-Text • 29B • Updated • 31.8k • 275 -
kakaocorp/kanana-2-30b-a3b-mid-2601
Text Generation • 31B • Updated • 124 • 30 -
black-forest-labs/FLUX.2-klein-base-4B
Image-to-Image • Updated • 19.7k • • 72 -
google/translategemma-12b-it
Image-Text-to-Text • 13B • Updated • 54.7k • 231
-
Running21
YOLO26
💙21Process images with advanced object detection and segmentation
-
RunningFeatured53
YOLO26 WebGPU
🏆53Real-time object detection & pose estimation in your browser
-
onnx-community/yolo26x-ONNX
Updated • 526 • 5 -
openvision/yoloe26-n-seg
Zero-Shot Object Detection • Updated • 143 • 2
-
Wuli-art/Qwen-Image-2512-Turbo-LoRA
Text-to-Image • Updated • 30k • 192 -
miromind-ai/MiroThinker-v1.5-235B
Text Generation • 235B • Updated • 2.39k • 245 -
prithivMLmods/Qwen-Image-Edit-2511-Object-Remover
Image-to-Image • Updated • 4.64k • • 47 -
tencent/Youtu-LLM-2B-Base
Text Generation • Updated • 5.55k • 39
-
facebook/sam3
Mask Generation • 0.9B • Updated • 1.65M • 1.48k -
Running on ZeroFeatured101
SAM3 Video Segmentation
🐠101Track and label objects in videos using text prompts or clicks
-
onnx-community/sam3-tracker-ONNX
Mask Generation • Updated • 4.31k • 26 -
Running22
SAM3 Tracker WebGPU
🎯22Segment and extract parts from images by clicking
-
facebook/metaclip-2-worldwide-s16
Zero-Shot Image Classification • 0.4B • Updated • 38 • 8 -
facebook/metaclip-2-worldwide-m16
Zero-Shot Image Classification • 0.5B • Updated • 5 • 3 -
facebook/metaclip-2-worldwide-l14
Zero-Shot Image Classification • 1B • Updated • 118 • 12 -
facebook/metaclip-2-worldwide-b32
Zero-Shot Image Classification • 0.6B • Updated • 128 • 6
-
opendatalab/OmniDocBench
Viewer • Updated • 1.36k • 9.73k • 67 -
nanonets/Nanonets-OCR-s
Image-Text-to-Text • 4B • Updated • 30.7k • 1.58k -
echo840/MonkeyOCR
Image-Text-to-Text • Updated • 256 • 514 -
Running on ZeroMCPFeatured140
Multimodal OCR2
💻140nanonets ocr / smoldocling / monkey ocr / typhoon ocr
-
deepseek-ai/DeepSeek-V3-0324
Text Generation • Updated • 293k • • 3.09k -
Qwen/Qwen2.5-Omni-7B
Any-to-Any • 11B • Updated • 163k • 1.85k -
google/txgemma-27b-chat
Text Generation • 27B • Updated • 563 • 57 -
RunningFeatured365
Qwen2.5 Omni 7B Demo
🏆365Generate text and speech responses from text, audio, images, or video input
-
Running on Zero267
Qwen2-VL-7B
🔥267Generate text from an image and question
-
Running67
UI-TARS
🌖67Find click coordinates on images based on instructions
-
Running97
Qwen2.5-1M Demo
💻97Answer questions about uploaded documents
-
Qwen/Qwen2.5-14B-Instruct-1M
Text Generation • 15B • Updated • 4.7k • • 332
-
NVLM: Open Frontier-Class Multimodal LLMs
Paper • 2409.11402 • Published • 74 -
BRAVE: Broadening the visual encoding of vision-language models
Paper • 2404.07204 • Published • 19 -
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Paper • 2403.18814 • Published • 47 -
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Paper • 2409.17146 • Published • 121
-
ibm-granite/granite-3.0-8b-instruct
Text Generation • 8B • Updated • 15.1k • 205 -
ibm-granite/granite-3.0-2b-instruct
Text Generation • 3B • Updated • 2.79k • 47 -
CohereLabs/aya-expanse-8b
Text Generation • 8B • Updated • 159k • 419 -
CohereLabs/aya-expanse-32b
Text Generation • 32B • Updated • 4.67k • • 286
-
Runtime errorFeatured100
LOTUS Normal
🌍100Generate high-quality predictions from images
-
Runtime error78
LOTUS Depth
🚀78Generate depth maps from images and videos
-
jingheya/lotus-depth-g-v1-0
Depth Estimation • Updated • 10.3k • 27 -
jingheya/lotus-depth-d-v1-0
Depth Estimation • Updated • 280 • 5
-
facebook/dinov2-large
Image Feature Extraction • 0.3B • Updated • 450k • 101 -
google/flan-t5-xl
3B • Updated • 131k • 526 -
google/siglip-large-patch16-384
Zero-Shot Image Classification • 0.7B • Updated • 18k • 11 -
google/vit-huge-patch14-224-in21k
Image Feature Extraction • 0.6B • Updated • 86.8k • 22
-
microsoft/resnet-50
Image Classification • 25.6M • Updated • 137k • • 473 -
google/vit-base-patch16-224-in21k
Image Feature Extraction • 86.4M • Updated • 858k • 393 -
google/vit-base-patch32-224-in21k
Image Feature Extraction • 88M • Updated • 6.13k • 19 -
facebook/dinov2-large
Image Feature Extraction • 0.3B • Updated • 450k • 101
-
facebook/deit-base-distilled-patch16-384
Image Classification • 87.6M • Updated • 9.39k • • 7 -
facebook/convnextv2-base-1k-224
Image Classification • 88.7M • Updated • 1.46k • • 4 -
facebook/deit-base-distilled-patch16-224
Image Classification • Updated • 6.88k • • 31 -
google/vit-base-patch32-384
Image Classification • 88.3M • Updated • 18.2k • • 23
-
facebook/detr-resnet-50
Object Detection • 41.6M • Updated • 571k • • 925 -
facebook/detr-resnet-101-dc5
Object Detection • 60.7M • Updated • 1.69k • 19 -
facebook/detr-resnet-50-dc5
Object Detection • 41.6M • Updated • 1.5k • 6 -
google/owlvit-base-patch32
Zero-Shot Object Detection • 0.2B • Updated • 79.7k • 144
-
facebook/maskformer-swin-large-coco
Image Segmentation • 0.2B • Updated • 1.07k • • 27 -
nvidia/segformer-b0-finetuned-ade-512-512
Image Segmentation • 3.75M • Updated • 365k • • 179 -
facebook/detr-resnet-50-dc5-panoptic
Image Segmentation • 43M • Updated • 26 • 3 -
nvidia/segformer-b5-finetuned-cityscapes-1024-1024
Image Segmentation • Updated • 76.2k • • 37
-
openai/clip-vit-large-patch14
Zero-Shot Image Classification • 0.4B • Updated • 8.16M • 1.95k -
openai/clip-vit-base-patch32
Zero-Shot Image Classification • Updated • 15.1M • 848 -
laion/CLIP-ViT-bigG-14-laion2B-39B-b160k
Zero-Shot Image Classification • Updated • 66.9k • 306 -
kakaobrain/align-base
Zero-Shot Image Classification • Updated • 11.2k • 29
-
microsoft/xclip-base-patch32
Video Classification • 0.2B • Updated • 183k • 108 -
facebook/timesformer-base-finetuned-k400
Video Classification • Updated • 27.3k • 42 -
facebook/timesformer-base-finetuned-k600
Video Classification • Updated • 1.17k • 12 -
google/vivit-b-16x2
Video Classification • Updated • 899 • 11
-
Salesforce/blip-image-captioning-large
Image-to-Text • 0.5B • Updated • 1.22M • 1.44k -
Salesforce/blip-image-captioning-base
Image-to-Text • Updated • 2M • 840 -
microsoft/trocr-base-handwritten
Image-to-Text • 0.3B • Updated • 137k • 474 -
microsoft/git-large-coco
Image-to-Text • 0.4B • Updated • 1.64k • 104
-
Running112
Grounding DINO Demo
💻112Cutting edge open-vocabulary object detection app
-
RunningFeatured95
Owlv2
👀95State-of-the-art Zero-shot Object Detection
-
Runtime errorFeatured41
BLIP2 with transformers
🌖41BLIP2 (cutting edge image captioning) in 🤗transformers
-
Build errorFeatured378
IDEFICS Playground
🐨378
-
RunningFeatured95
Owlv2
👀95State-of-the-art Zero-shot Object Detection
-
Runtime errorFeatured64
Owl Tracking
⚡64Powerful foundation model for zero-shot object tracking
-
Sleeping26
Search and Detect (CLIP/OWL-ViT)
🦉26Search and detect objects in images using text queries
-
Running on ZeroFeatured109
OWLSAM
😻109State-of-the-art open-vocabulary image segmentation ⚡️
-
Running on ZeroFeatured72
Draw To Search Art
🐠72Draw/upload image and search among WikiART using SigLIP
-
Running on CPU Upgrade23
Compare Clip Siglip
🏃23Compare strong zero-shot image classification models
-
Running on Zero13
Multilingual Zero Shot Image Clf
🏢13Comparing powerful multilingual zero-shot image clf models
-
BAAI/bunny-phi-2-siglip-lora
Text Generation • Updated • 239 • 48
-
Runtime errorFeatured84
UDOP
🏃84Generate text from document images
-
Runtime error40
Pix2struct
📚40Play with all the pix2struct variants in this d
-
Sleeping26
Compare Docvqa Models
🦀26Compare different visual question answering
-
Runtime errorFeatured289
DocQuery — Document Query Engine
🦉289
-
Improved Baselines with Visual Instruction Tuning
Paper • 2310.03744 • Published • 39 -
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper • 2403.05525 • Published • 49 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 11 -
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Paper • 2404.01331 • Published • 27
-
google/owlvit-base-patch32
Zero-Shot Object Detection • 0.2B • Updated • 79.7k • 144 -
google/owlvit-base-patch16
Zero-Shot Object Detection • Updated • 5.81k • 13 -
google/owlvit-large-patch14
Zero-Shot Object Detection • Updated • 7.11k • 29 -
google/owlv2-base-patch16
Zero-Shot Object Detection • 0.2B • Updated • 19.1k • 29
-
google/owlvit-base-patch32
Zero-Shot Object Detection • 0.2B • Updated • 79.7k • 144 -
google/owlvit-base-patch16
Zero-Shot Object Detection • Updated • 5.81k • 13 -
google/owlvit-large-patch14
Zero-Shot Object Detection • Updated • 7.11k • 29 -
google/owlv2-base-patch16
Zero-Shot Object Detection • 0.2B • Updated • 19.1k • 29
-
google/owlvit-base-patch32
Zero-Shot Object Detection • 0.2B • Updated • 79.7k • 144 -
google/owlvit-base-patch16
Zero-Shot Object Detection • Updated • 5.81k • 13 -
google/owlvit-large-patch14
Zero-Shot Object Detection • Updated • 7.11k • 29 -
google/owlv2-base-patch16
Zero-Shot Object Detection • 0.2B • Updated • 19.1k • 29
-
Running192
Vidore Leaderboard
🥇192Browse and compare visual document retrieval models
-
Running on CPU Upgrade977
Open VLM Leaderboard
🌎977VLMEvalKit Evaluation Results Collection
-
RunningFeatured559
Vision Arena (Testing VLMs side-by-side)
🖼559Display image analysis results
-
RunningFeatured85
SEED-Bench Leaderboard
🏆85Submit model evaluation results to leaderboard
-
Paused21
Video Llava
🐨21Generate descriptions by uploading images or videos
-
llava-hf/LLaVA-NeXT-Video-7B-hf
Video-Text-to-Text • 7B • Updated • 57.9k • 122 -
llava-hf/LLaVA-NeXT-Video-7B-DPO-hf
Video-Text-to-Text • 7B • Updated • 1.03k • 11 -
llava-hf/LLaVA-NeXT-Video-7B-32K-hf
Image-Text-to-Text • 8B • Updated • 341 • 8