Source of training queries of BGE-EN-ICL

quora: 10k test, 5k dev queries in beir -- bge-full-data has 60202 queries.
scidocsrr: 1k test queries in beir -- bge-full-data has 12654 queries.
arguana: 1406 test queries in beir -- bge-full-data has 3101 queries.

#13

by ftvalentini - opened Mar 27, 2025

I have some questions regarding the origin of the training queries used for BGE-EN-ICL, which have no training queries in BEIR:

Where do these train queries come from?

Also for nli dataset: what is the source dataset?

Thank you so much for making such a valuable dataset available!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment