Sentence Similarity
Safetensors
sentence-transformers
German
PyLate
bert
ColBERT
text-embeddings-inference
Instructions to use samheym/GerColBERT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use samheym/GerColBERT with sentence-transformers:
from pylate import models queries = [ "Which planet is known as the Red Planet?", "What is the largest planet in our solar system?", ] documents = [ ["Mars is the Red Planet.", "Venus is Earth's twin."], ["Jupiter is the largest planet.", "Saturn has rings."], ] model = models.ColBERT(model_name_or_path="samheym/GerColBERT") queries_emb = model.encode(queries, is_query=True) docs_emb = model.encode(documents, is_query=False) - Notebooks
- Google Colab
- Kaggle
metadata
language:
- de
tags:
- ColBERT
- PyLate
- sentence-transformers
- sentence-similarity
pipeline_tag: sentence-similarity
library_name: PyLate
datasets:
- samheym/ger-dpr-collection
base_model:
- deepset/gbert-base
Model Overview
GerColBERT is a ColBERT-based retrieval model trained on German text. It is designed for efficient late interaction-based retrieval while maintaining high-quality ranking performance. Training Configuration
- Base Model: deepset/gbert-base
- Training Dataset: samheym/ger-dpr-collection
- Dataset: 10% of randomly selected triples from the final dataset
- Vector Length: 128
- Maximum Document Length: 256 Tokens
- Batch Size: 50
- Training Steps: 80,000
- Gradient Accumulation: 1 step
- Learning Rate: 5 × 10⁻⁶
- Optimizer: AdamW
- In-Batch Negatives: Included
Usage
First install the PyLate library:
pip install -U pylate
Retrieval
PyLate provides a streamlined interface to index and retrieve documents using ColBERT models. The index leverages the Voyager HNSW index to efficiently handle document embeddings and enable fast retrieval.
from pylate import indexes, models, retrieve
# Step 1: Load the ColBERT model
model = models.ColBERT(
model_name_or_path=samheym/GerColBERT,
)