Vortex-Embed v3 β€” Sentence-Similarity for RAG

Retrieval-optimized 4-bit static embeddings for sentence-similarity and RAG.

Built on VTXAI/Vortex-Embed-4.7M (29528 vocab Γ— 256 dim, 4-bit LF4 packed = 4.7 MB on disk) with a set of training-free retrieval upgrades that lift STS-B Spearman from 0.7462 (baseline LF4) to 0.7560 (v3 with SIF+PC=1).

What changed vs the v1 baseline

All four upgrades are inference-time only β€” the underlying 4-bit weights are bit-identical to the v1 artifact. They are:

  1. SIF IDF weighting with sif_a=0.01 (sweep-optimized for STS-B).
  2. Top-1 PC removal (sweep-optimized β€” 1 PC is enough for STS-B).
  3. Pure-numpy bucket-boundary segment-sum for fast mean-pool.
  4. CPU-torch scatter (index_add_) for the hot path.

Benchmark

Model Spearman ρ STS-B Encode ms/text Dequant cold RAM On-disk
LF4 baseline (v1) 0.7462 0.87 231 ms 30 MB 4.7 MB
Vortex-Embed v3 (this) 0.7560 0.08 51 ms 30 MB 4.7 MB

+1.0 pp Spearman, 11Γ— faster encode.

Usage

from huggingface_hub import snapshot_download
from lf4_v3_sentence import VortexEmbedV3

path = snapshot_download("VTXAI/Vortex-Embed-v3-sentence")
model = VortexEmbedV3.from_pretrained(path)
print(f"vocab={model.vocab_size}, dim={model.dim}, size={model.model_size_mb:.1f} MB")

# Single-text encode
vec = model.encode("find python json parser", normalize=True)  # (256,)

# Batch encode
docs = ["def parse_json(s): return json.loads(s)",
        "class WeatherAPI: pass",
        "import requests"]
doc_embs = model.encode(docs, normalize=True)  # (3, 256)

# RAG retrieval
import numpy as np
# ... chunk corpus, build doc_embs as (n, 256) ...
query = "where do we parse JSON requests"
q_emb = model.encode(query, normalize=True)
scores, indices = model.search(q_emb, doc_embs, top_k=10)
for rank, (s, i) in enumerate(zip(scores[0], indices[0]), 1):
    print(f"#{rank} ({s:.3f}) doc #{i}")

Files

  • model.safetensors β€” 4-bit LF4 packed weights (3.7 MB)
  • tokenizer.json β€” HuggingFace fast tokenizer
  • config.json β€” model + retrieval config
  • lf4_v3_sentence.py β€” self-contained model class
  • README.md β€” this file

License

Apache 2.0

Downloads last month
24
Safetensors
Model size
4.25M params
Tensor type
F16
Β·
U8
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support