jina-embeddings-v5-omni-nano-mlx

Unified MLX port of jinaai/jina-embeddings-v5-omni-nano.

Combines a base checkpoint with per-task LoRA adapters for dynamic task switching - one model, all tasks, minimal memory.

Architecture: Bidirectional LLaMA / EuroBERT text backbone (12 layers, hidden 768) + Qwen3VL vision tower + Qwen2.5-Omni audio tower
Matryoshka dimensions: 32, 64, 128, 256, 512
Max sequence length: 8192 tokens
Embedding dimension: 768
Tasks: retrieval, text-matching, clustering, classification

Structure

jina-embeddings-v5-omni-nano-mlx/
├── model.py              # MLX model implementation
├── utils.py              # load_model() + JinaMultiTaskModel
├── model.safetensors     # Base weights (no LoRA merged)
├── config.json
├── tokenizer.json
└── adapters/
    ├── retrieval/
    ├── text-matching/
    ├── clustering/
    └── classification/
        ├── adapter_config.json
        └── adapter_model.safetensors

Usage

# Clone the repo locally first
import subprocess
subprocess.run(["git", "clone", "https://huggingface.co/jinaai/jina-embeddings-v5-omni-nano-mlx", "/tmp/jina-omni-nano-mlx"])

import sys
sys.path.insert(0, "/tmp/jina-omni-nano-mlx")
from utils import load_model

model = load_model("/tmp/jina-omni-nano-mlx")

# Switch task and encode
model.switch_task("retrieval")
embeddings = model.encode(
    ["What is neural search?", "Neural search uses deep learning"],
    task_type="retrieval.query",
)

# Switch to another task (< 20ms, no model reload)
model.switch_task("text-matching")
embeddings = model.encode(["Hello world"], task_type="text-matching")

Task types

Task	`task_type` values
Retrieval (query)	`"retrieval.query"`
Retrieval (document)	`"retrieval.passage"`
Text matching	`"text-matching"`
Clustering	`"clustering"`
Classification	`"classification"`

Matryoshka truncation

embeddings = model.encode(texts, task_type="retrieval.query", truncate_dim=256)

Differences from per-task repos

The per-task repos store merged weights. This unified repo stores:

Base weights once (1.9 GB)
4 small LoRA adapters (~13 MB each)

Switching tasks takes ~20ms via in-place weight patching.

jinaai/jina-embeddings-v5-omni-nano - Original PyTorch model
jinaai/jina-embeddings-v5-omni-small-mlx - Larger variant
jinaai/jina-embeddings-v5-text-nano-mlx - Text-only variant

Citation

@article{jina-embeddings-v5,
  title={Jina Embeddings v5: A Frontier Multilingual Embedding Model},
  author={Jina AI},
  year={2025},
  url={https://huggingface.co/jinaai/jina-embeddings-v5-omni-nano}
}

Downloads last month: 295

Safetensors

Model size

0.9B params

Tensor type

BF16

F32

MLX

Hardware compatibility

Quantized

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

jinaai
/

jina-embeddings-v5-omni-nano-mlx