Instructions to use sinequa/vectorizer.raspberry with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sinequa/vectorizer.raspberry with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("sinequa/vectorizer.raspberry") model = AutoModelForMaskedLM.from_pretrained("sinequa/vectorizer.raspberry") - Notebooks
- Google Colab
- Kaggle
| pipeline_tag: sentence-similarity | |
| tags: | |
| - feature-extraction | |
| - sentence-similarity | |
| language: | |
| - de | |
| - en | |
| - es | |
| - fr | |
| - it | |
| - nl | |
| - ja | |
| - pt | |
| - zh | |
| # Model Card for `vectorizer.raspberry` | |
| This model is a vectorizer developed by Sinequa. It produces an embedding vector given a passage or a query. The passage vectors are stored in our vector index and the query vector is used at query time to look up relevant passages in the index. | |
| Model name: `vectorizer.raspberry` | |
| ## Supported Languages | |
| The model was trained and tested in the following languages: | |
| - English | |
| - French | |
| - German | |
| - Spanish | |
| - Italian | |
| - Dutch | |
| - Japanese | |
| - Portuguese | |
| - Chinese (simplified) | |
| Besides these languages, basic support can be expected for additional 91 languages that were used during the pretraining of the base model (see Appendix A of XLM-R paper). | |
| ## Scores | |
| | Metric | Value | | |
| |:-----------------------|------:| | |
| | Relevance (Recall@100) | 0.613 | | |
| Note that the relevance score is computed as an average over 14 retrieval datasets (see | |
| [details below](#evaluation-metrics)). | |
| ## Inference Times | |
| | GPU | Quantization type | Batch size 1 | Batch size 32 | | |
| |:------------------------------------------|:------------------|---------------:|---------------:| | |
| | NVIDIA A10 | FP16 | 1 ms | 5 ms | | |
| | NVIDIA A10 | FP32 | 2 ms | 18 ms | | |
| | NVIDIA T4 | FP16 | 1 ms | 12 ms | | |
| | NVIDIA T4 | FP32 | 3 ms | 52 ms | | |
| | NVIDIA L4 | FP16 | 2 ms | 5 ms | | |
| | NVIDIA L4 | FP32 | 4 ms | 24 ms | | |
| ## GPU Memory usage | |
| | Quantization type | Memory | | |
| |:-------------------------------------------------|-----------:| | |
| | FP16 | 550 MiB | | |
| | FP32 | 1050 MiB | | |
| Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch | |
| size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which | |
| can be around 0.5 to 1 GiB depending on the used GPU. | |
| ## Requirements | |
| - Minimal Sinequa version: 11.10.0 | |
| - [CUDA compute capability](https://developer.nvidia.com/cuda-gpus): above 7.5 | |
| ## Model Details | |
| ### Overview | |
| - Number of parameters: 107 million | |
| - Base language | |
| model: [mMiniLMv2-L6-H384-distilled-from-XLMR-Large](https://huggingface.co/nreimers/mMiniLMv2-L6-H384-distilled-from-XLMR-Large) ([Paper](https://arxiv.org/abs/2012.15828), [GitHub](https://github.com/microsoft/unilm/tree/master/minilm)) | |
| - Insensitive to casing and accents | |
| - Output dimensions: 256 (reduced with an additional dense layer) | |
| - Training procedure: Query-passage-negative triplets for datasets that have mined hard negative data, Query-passage | |
| pairs for the rest. Number of negatives is augmented with in-batch negative strategy | |
| ### Training Data | |
| The model have been trained using all datasets that are cited in the [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) model. In addition to that, this model has been trained on the datasets cited in [this paper](https://arxiv.org/pdf/2108.13897.pdf) on the 9 aforementioned languages. | |
| ### Evaluation Metrics | |
| To determine the relevance score, we averaged the results that we obtained when evaluating on the datasets of the | |
| [BEIR benchmark](https://github.com/beir-cellar/beir). Note that all these datasets are in English. | |
| | Dataset | Recall@100 | | |
| |:------------------|-----------:| | |
| | Average | 0.613 | | |
| | | | | |
| | Arguana | 0.957 | | |
| | CLIMATE-FEVER | 0.468 | | |
| | DBPedia Entity | 0.377 | | |
| | FEVER | 0.820 | | |
| | FiQA-2018 | 0.639 | | |
| | HotpotQA | 0.560 | | |
| | MS MARCO | 0.845 | | |
| | NFCorpus | 0.287 | | |
| | NQ | 0.756 | | |
| | Quora | 0.992 | | |
| | SCIDOCS | 0.456 | | |
| | SciFact | 0.906 | | |
| | TREC-COVID | 0.100 | | |
| | Webis-Touche-2020 | 0.413 | | |
| We evaluated the model on the datasets of the [MIRACL benchmark](https://github.com/project-miracl/miracl) to test its multilingual capacities. Note that not all training languages are part of the benchmark, so we only report the metrics for the existing languages. | |
| | Language | Recall@100 | | |
| |:----------------------|-----------:| | |
| | French | 0.650 | | |
| | German | 0.528 | | |
| | Spanish | 0.602 | | |
| | Japanese | 0.614 | | |
| | Chinese (simplified) | 0.680 | | |