Sentence Similarity
sentence-transformers
PyTorch
ONNX
xlm-roberta
feature-extraction
Eval Results
text-embeddings-inference
Instructions to use BAAI/bge-m3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use BAAI/bge-m3 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("BAAI/bge-m3") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Inference
- Notebooks
- Google Colab
- Kaggle
Update MIRACL evaluation results (#68)
Browse files- Update MIRACL evaluation results (0c6f0d0ea8f284b9070c3ffaa50677440943f984)
Co-authored-by: Jianlv Chen <hanhainebula@users.noreply.huggingface.co>
README.md
CHANGED
|
@@ -33,6 +33,13 @@ Utilizing the re-ranking model (e.g., [bge-reranker](https://github.com/FlagOpen
|
|
| 33 |
|
| 34 |
|
| 35 |
## News:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
- 2024/3/20: **Thanks Milvus team!** Now you can use hybrid retrieval of bge-m3 in Milvus: [pymilvus/examples
|
| 37 |
/hello_hybrid_sparse_dense.py](https://github.com/milvus-io/pymilvus/blob/master/examples/hello_hybrid_sparse_dense.py).
|
| 38 |
- 2024/3/8: **Thanks for the [experimental results](https://towardsdatascience.com/openai-vs-open-source-multilingual-embedding-models-e5ccb7c90f05) from @[Yannael](https://huggingface.co/Yannael). In this benchmark, BGE-M3 achieves top performance in both English and other languages, surpassing models such as OpenAI.**
|
|
|
|
| 33 |
|
| 34 |
|
| 35 |
## News:
|
| 36 |
+
- 2024/7/1: **We update the MIRACL evaluation results of BGE-M3**. To reproduce the new results, you can refer to: [bge-m3_miracl_2cr](https://huggingface.co/datasets/hanhainebula/bge-m3_miracl_2cr). We have also updated our [paper](https://arxiv.org/pdf/2402.03216) on arXiv.
|
| 37 |
+
<details>
|
| 38 |
+
<summary> Details </summary>
|
| 39 |
+
|
| 40 |
+
The previous test results were lower because we mistakenly removed the passages that have the same id as the query from the search results. After correcting this mistake, the overall performance of BGE-M3 on MIRACL is higher than the previous results, but the experimental conclusion remains unchanged. The other results are not affected by this mistake. To reproduce the previous lower results, you need to add the `--remove-query` parameter when using `pyserini.search.faiss` or `pyserini.search.lucene` to search the passages.
|
| 41 |
+
|
| 42 |
+
</details>
|
| 43 |
- 2024/3/20: **Thanks Milvus team!** Now you can use hybrid retrieval of bge-m3 in Milvus: [pymilvus/examples
|
| 44 |
/hello_hybrid_sparse_dense.py](https://github.com/milvus-io/pymilvus/blob/master/examples/hello_hybrid_sparse_dense.py).
|
| 45 |
- 2024/3/8: **Thanks for the [experimental results](https://towardsdatascience.com/openai-vs-open-source-multilingual-embedding-models-e5ccb7c90f05) from @[Yannael](https://huggingface.co/Yannael). In this benchmark, BGE-M3 achieves top performance in both English and other languages, surpassing models such as OpenAI.**
|