Activity Feed

AI & ML interests

None defined yet.

Recent Activity

alvarobarttย 
posted an update about 2 months ago
view post
Post
3624
Learn how to deploy Microsoft Research VibeVoice ASR on Microsoft Azure Foundry with Hugging Face to generate rich audio transcriptions with Who, When, and What! ๐Ÿ’ฅ

> ๐Ÿ•’ 60-minute single-pass processing, no chunking or stitching
> ๐Ÿ‘ค Customized hotwords to guide recognition on domain-specific content
> ๐Ÿ“ Rich transcription: joint ASR + diarization + timestamping in one pass
> ๐ŸŒ 50+ languages with automatic detection and code-switching support
> ๐Ÿค— Deployed on Microsoft Foundry via an OpenAI-compatible Chat Completions API

https://huggingface.co/docs/microsoft-azure/foundry/examples/deploy-vibevoice-asr
alvarobarttย 
posted an update 3 months ago
view post
Post
3225
๐Ÿ’ฅ hf-mem v0.4.1 now also estimates KV cache memory requirements for any context length and batch size with the --experimental flag!

uvx hf-mem --model-id ... --experimental will automatically pull the required information from the Hugging Face Hub to include the KV cache estimation, when applicable.

๐Ÿ’ก Alternatively, you can also set the --max-model-len, --batch-size and --kv-cache-dtype arguments (ร  la vLLM) manually if preferred.
  • 1 reply
ยท
merveย 
posted an update 6 months ago
view post
Post
11583
deepseek-ai/DeepSeek-OCR is out! ๐Ÿ”ฅ my take โคต๏ธ
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages
  • 4 replies
ยท
merveย 
posted an update 7 months ago
view post
Post
6987
large AI labs open-sourced a ton of models last week ๐Ÿ”ฅ
here's few picks, find even more here merve/sep-16-releases-68d13ea4c547f02f95842f05 ๐Ÿค
> IBM released a new Docling model with 258M params based on Granite (A2.0) ๐Ÿ“ ibm-granite/granite-docling-258M
> Xiaomi released 7B audio LM with base and instruct variants (MIT) XiaomiMiMo/mimo-audio-68cc7202692c27dae881cce0
> DecartAI released Lucy Edit, open Nano Banana ๐ŸŒ (NC) decart-ai/Lucy-Edit-Dev
> OpenGVLab released a family of agentic computer use models (3B/7B/32B) with the dataset ๐Ÿ’ป OpenGVLab/scalecua-68c912cf56f7ff4c8e034003
> Meituan Longcat released thinking version of LongCat-Flash ๐Ÿ’ญ meituan-longcat/LongCat-Flash-Thinking
  • 2 replies
ยท
merveย 
posted an update 7 months ago
view post
Post
3521
IBM just released small swiss army knife for the document models: granite-docling-258M on Hugging Face ๐Ÿ”ฅ

> not only a document converter but also can do document question answering, understand multiple languages ๐Ÿคฏ
> best part: released with Apache 2.0 license ๐Ÿ‘ use it with your commercial projects!
> it supports transformers, vLLM and MLX from the get-go! ๐Ÿค—
> built on SigLIP2 & granite-165M

model: ibm-granite/granite-docling-258M
demo: ibm-granite/granite-docling-258m-demo ๐Ÿ’—
lysandreย 
posted an update 7 months ago
view post
Post
8596
We're kick-starting the process of Transformers v5, with @ArthurZ and @cyrilvallez !

v5 should be significant: we're using it as a milestone for performance optimizations, saner defaults, and a much cleaner code base worthy of 2025.

Fun fact: v4.0.0-rc-1 came out on Nov 19, 2020, nearly five years ago!
  • 6 replies
ยท
merveย 
posted an update 7 months ago
view post
Post
1277
a ton of image/video generation models and LLMs from big labs ๐Ÿ”ฅ

> Meta released facebook/mobilellm-r1-68c4597b104fac45f28f448e, smol LLMs for on-device use ๐Ÿ’ฌ
> Tencent released tencent/SRPO, high res image generation model and tencent/POINTS-Reader, cutting edge OCR ๐Ÿ“
> ByteDance released bytedance-research/HuMo, video generation from any input โฏ๏ธ

find more models, datasets, demos here merve/sep-11-releases-68c7dbfa26bea8cd921fa0ac
merveย 
posted an update 7 months ago
view post
Post
1070
fan-favorite vision LM Florence-2 is now officially supported in transformers ๐Ÿค—

find all the models in
florence-community
org ๐Ÿซก
ariG23498ย 
posted an update 7 months ago
view post
Post
2131
New post is live!

This time we cover some major updates to transformers.

๐Ÿค—
  • 2 replies
ยท
merveย 
posted an update 8 months ago
merveย 
posted an update 8 months ago
merveย 
posted an update 8 months ago
view post
Post
6322
large AI labs have dropped so many open models last week ๐Ÿ”ฅ don't miss out on them

โ†’ Apple released on-device vision LMs apple/fastvlm-68ac97b9cd5cacefdd04872e & apple/mobileclip2-68ac947dcb035c54bcd20c47
โ†’ OpenGVLab released InternVL3.5, 32 new vision LMs with one based on gpt-oss! (OS) OpenGVLab/internvl35-68ac87bd52ebe953485927fb
โ†’ MSFT released a killer small TTS model (OS) microsoft/VibeVoice-1.5B

find more herehttps://huggingface.co/collections/merve/august-29-releases-68b5a3754cfb8abf59e2b486
  • 1 reply
ยท
merveย 
posted an update 8 months ago
view post
Post
6109
first vision language model built off openai/gpt-oss-20b just dropped! ๐Ÿ”ฅ

InternVL3.5 comes with 32 models ๐Ÿคฏ pre-trained, fine-tuned, aligned in various sizes OpenGVLab/internvl35-68ac87bd52ebe953485927fb
comes with gpt-oss or Qwen3 for LLM part โคต๏ธ
  • 1 reply
ยท
merveย 
posted an update 9 months ago
view post
Post
3347
GPT-4.1-mini level model right in your iPhone ๐Ÿคฏ

openbmb/MiniCPM-V-4 is only 4B while surpassing GPT-4.1-mini in vision benchmarks ๐Ÿ”ฅ

allows commercial use as well!
merveย 
posted an update 9 months ago
view post
Post
1207
we're all sleeping on this OCR model rednote-hilab/dots.ocr ๐Ÿ”ฅ

dots.ocr is a new 3B model with sota performance, support for 100 languages & allowing commercial use! ๐Ÿคฏ

single e2e model to extract image, convert tables, formula, and more into markdown ๐Ÿ“
try it MohamedRashad/Dots-OCR