Instructions to use hkunlp/instructor-xl with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use hkunlp/instructor-xl with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("hkunlp/instructor-xl") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use hkunlp/instructor-xl with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("hkunlp/instructor-xl") model = AutoModel.from_pretrained("hkunlp/instructor-xl") - Notebooks
- Google Colab
- Kaggle
Fine tuning
Thank you for sharing this model and paper.
I'm investigating what would take to further fine tune Instructor-XL to a legal domain for retrival tasks.
I'm trying to assess what could be a good starting training set size, loss temperature and what could be a good k of negative pairs per positive pairs.
I welcome any other heads-ups.
PS. with hindsight I feel a little daft asking about finetuning when the model card explicitly say "embeddings tailored to any task and domains [...] by simply providing the task instruction, without any finetuning. " , please let me know if it is a stupid idea.
Thank you very much for your interest in INSTRUCTOR!
The instruction serves as an efficient option for adapting embeddings to specific domains, but you can also further enhance the model ability through finetuning. At the start, you may use all the available training data (probably training for a maximum of 40K steps). For other hyper-parameters, you may adopt our default setting (e.g., loss_temperature=0.01, k=4, etc.)
Hope this helps! Feel free to add any further questions or comments!