Ofer Hasson
hassonofer
AI & ML interests
Computer Vision
Recent Activity
upvoted a collection about 9 hours ago
Perception Encoder reacted to Anran-MLLM's post with ๐ about 9 hours ago
๐ Introducing PerceptionDLM โ the first multimodal diffusion LLM for parallel region perception!
Most MLLMs are autoregressive, so captioning N regions costs N sequential passes. PerceptionDLM instead describes ALL masked regions in a single denoising process. ๐งฉ
โจ Highlights
โข โก Up to 3.4ร faster on dense multi-region captioning, with stable per-image latency
โข ๐ PerceptionDLM-Base beats LLaDA-V on 15/16 multimodal benchmarks (new SOTA among open diffusion VLMs)
โข ๐ New benchmark: ParaDLC-Bench โ jointly evaluates caption quality AND inference efficiency
โข ๐ Code, models & benchmark all open-sourced
๐ค Models
https://huggingface.co/MSALab/PerceptionDLM-Base
https://huggingface.co/MSALab/PerceptionDLM
๐ Benchmark
https://huggingface.co/datasets/MSALab/ParaDLC-Bench
๐ Paper: https://huggingface.co/papers/2606.19534
๐ป Code: https://github.com/MSALab-PKU/PerceptionDLM
Diffusion LLMs aren't just for text โ they unlock efficient, parallel visual perception. ๐๏ธโจ
#multimodal #diffusion #VLM #perception
liked a model about 22 hours ago
MiniMaxAI/MiniMax-M3