jina-embeddings-v5-omni-nano-mlx

Unified MLX port of jinaai/jina-embeddings-v5-omni-nano.

Combines a base checkpoint with per-task LoRA adapters for dynamic task switching - one model, all tasks, minimal memory.

  • Architecture: Bidirectional LLaMA / EuroBERT text backbone (12 layers, hidden 768) + Qwen3VL vision tower + Qwen2.5-Omni audio tower
  • Matryoshka dimensions: 32, 64, 128, 256, 512
  • Max sequence length: 8192 tokens
  • Embedding dimension: 768
  • Tasks: retrieval, text-matching, clustering, classification

Structure

jina-embeddings-v5-omni-nano-mlx/
├── model.py              # MLX model implementation
├── utils.py              # load_model() + JinaMultiTaskModel
├── model.safetensors     # Base weights (no LoRA merged)
├── config.json
├── tokenizer.json
└── adapters/
    ├── retrieval/
    ├── text-matching/
    ├── clustering/
    └── classification/
        ├── adapter_config.json
        └── adapter_model.safetensors

Usage

# Clone the repo locally first
import subprocess
subprocess.run(["git", "clone", "https://huggingface.co/jinaai/jina-embeddings-v5-omni-nano-mlx", "/tmp/jina-omni-nano-mlx"])

import sys
sys.path.insert(0, "/tmp/jina-omni-nano-mlx")
from utils import load_model

model = load_model("/tmp/jina-omni-nano-mlx")

# Switch task and encode
model.switch_task("retrieval")
embeddings = model.encode(
    ["What is neural search?", "Neural search uses deep learning"],
    task_type="retrieval.query",
)

# Switch to another task (< 20ms, no model reload)
model.switch_task("text-matching")
embeddings = model.encode(["Hello world"], task_type="text-matching")

Task types

Task task_type values
Retrieval (query) "retrieval.query"
Retrieval (document) "retrieval.passage"
Text matching "text-matching"
Clustering "clustering"
Classification "classification"

Matryoshka truncation

embeddings = model.encode(texts, task_type="retrieval.query", truncate_dim=256)

Differences from per-task repos

The per-task repos store merged weights. This unified repo stores:

  • Base weights once (1.9 GB)
  • 4 small LoRA adapters (~13 MB each)

Switching tasks takes ~20ms via in-place weight patching.

Related

Citation

@article{jina-embeddings-v5,
  title={Jina Embeddings v5: A Frontier Multilingual Embedding Model},
  author={Jina AI},
  year={2025},
  url={https://huggingface.co/jinaai/jina-embeddings-v5-omni-nano}
}
Downloads last month
295
Safetensors
Model size
0.9B params
Tensor type
BF16
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support