pplx-embed-v1-late-0.6b: Late-Interaction Embeddings

pplx-embed-v1-late-0.6b is a token-level late-interaction embedding model for retrieval with MaxSim scoring. It is continued training of pplx-embed-v1-0.6b using ContrastiveLoss to optimize token-level MaxSim.

Usage

Install PyLate:

pip install -U pylate

Index and retrieve documents:

from pylate import indexes, models, retrieve

# Load the model (requires trust_remote_code for the custom architecture).
model = models.ColBERT(
    model_name_or_path="perplexity-ai/pplx-embed-v1-late-0.6b",
    trust_remote_code=True,
)

# Documents to index.
documents_ids = ["1", "2", "3"]
documents = [
    "Scientists explore the universe driven by curiosity.",
    "Children learn through curious exploration.",
    "Historical discoveries began with curious questions.",
]

# Build a PLAID index over the document embeddings.
index = indexes.PLAID(
    index_folder="pylate-index",
    index_name="pplx-embed-v1-late-0.6b",
    override=True,
)
documents_embeddings = model.encode(
    documents,
    batch_size=32,
    is_query=False,
    show_progress_bar=True,
)
index.add_documents(
    documents_ids=documents_ids,
    documents_embeddings=documents_embeddings,
)

# Retrieve the top-k documents for a query.
retriever = retrieve.ColBERT(index=index)
queries_embeddings = model.encode(
    ["What motivates scientific discovery?"],
    batch_size=32,
    is_query=True,
    show_progress_bar=True,
)
scores = retriever.retrieve(queries_embeddings=queries_embeddings, k=3)

print(scores)

Performance

We evaluate pplx-embed-v1-late-0.6b on two standard late-interaction retrieval suites and report the average nDCG@10:

BEIR — average over 15 English retrieval tasks.
MIRACL — average over 18 languages.

Benchmark	`pplx-embed-v1-late-0.6b`	Reference
BEIR (15 tasks)	56.61	colbert-zero: 55.43
MIRACL (18 langs)	66.62	jina-colbert-v2: 62.28

Technical Details

This model uses late interaction: queries and documents are encoded as token-level vectors and scored with MaxSim rather than pooled into a single vector.

For background on the base embedding family, see the pplx-embed-v1-0.6b model card and the technical report: https://arxiv.org/abs/2602.11151.