Multimodal (text + image + video + audio) embedding models aligned with jina-embeddings-v5-text-*. Two sizes, four task variants each.
-
jina-embeddings-v5-omni: Text-Geometry-Preserving Multimodal Embeddings via Frozen-Tower Composition
Paper • 2605.08384 • Published • 11 -
jinaai/jina-embeddings-v5-omni-small
Feature Extraction • 2B • Updated • 34k • 61 -
jinaai/jina-embeddings-v5-omni-nano
Feature Extraction • 1.0B • Updated • 82.5k • 27 -
jinaai/jina-embeddings-v5-omni-nano-text-matching
Feature Extraction • 0.9B • Updated • 1.39k • 3
