Matryoshka Representation Learning
Paper • 2205.13147 • Published • 27
How to use amentaphd/gte-modernbert-base with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("amentaphd/gte-modernbert-base")
sentences = [
"What are the specific points and subparagraphs mentioned in the context of Article 4(3) that relate to the introductory wording and how do they connect to the provisions outlined in Article 3(1)?",
"51 - Article 2, points 52, 53,54, 55 and 56 - Article 3 - Article 4(1) Article 3(1), first subparagraph Article 4(2), first subparagraph Article 4(2), second subparagraph Article 3(1), second subparagraph, introductory wording Article 4(3), first subparagraph, introductory wording Article 3(1), second subparagraph, points (a) and (b) Article 4(3), first subparagraph, points (a) and (b) Article 3(1), second subparagraph, point (c) - Article 3(1), second subparagraph, point (d) Article 4(3), first subparagraph, point (c) Article 3(1), third subparagraph, introductory wording - - Article 4(3), first subparagraph, point (d), introductory wording - Article 4(3), first subparagraph, points (d)(i), (ii) and (iii) Article 3(1), third subparagraph, point (a) Article 4(3), first subparagraph, point (d)(iv) - Article 4(3), first subparagraph, point (e), introductory wording Article 3(1), third subparagraph, point (b) Article 4(3), first subparagraph, point (e)(i) Article 3(1), third subparagraph, point (c) Article 4(3), first subparagraph, point (e)(ii) Article 3(1), third subparagraph, point (d) Article 4(3), first subparagraph, point (e)(iii) Article 3(1), third subparagraph, point (e) - - Article 4(3), first subparagraph, point (e)(iv) Article 3(2) and (3) - Article 3(4) Article 35(6) Article 3(5) and (6) - - Article 4(4) - Article 4(5) Article 4(6) Article 4(7) - Article 5 Article 5(1), first subparagraph Article 6(1), first subparagraph Article 5(1), second subparagraph Article 6(1), fifth subparagraph - Article 6(1), second and third subparagraph Article 5(1), third subparagraph Article 6(1), fourth subparagraph Article 5(1), fourth and fifth subparagraph - Article 5(2) - Article 6(2) Article 6(2), second subparagraph Article 5(3) Article 6(3) Article 5(4) Article 6(4) Article 5(5) Article 6(5) Article 5(5), first subparagraph, point (b) Article 6(5), second subparagraph, point (c) - Article 6(5), second subparagraph, point (b) Article 5(6) Article 6(6) - Article 6(6), second subparagraph, point (a) Article 5(6), second subparagraph Article 6(6), second subparagraph, point (b) Article 5(6), third subparagraph Article 6(6), third subparagraph Article 5(7) - Article 6(1), first subparagraph Article 7(1), first",
"ii.\n\nmeasures to protect against retaliation its own workers who are whistleblowers in accordance with the applicable law transposing Directive (EU) 2019/1937 of the European Parliament and of the Council ( 121 );\n\n(d)\n\nwhere the undertaking has no policies on the protection of whistle-blowers ( 122 ), it shall state this and whether it has plans to implement them and the timetable for implementation;\n\n(e)\n\nbeyond the procedures to follow-up on reports by whistleblowers in accordance with the applicable law transposing Directive (EU) 2019/1937, whether the undertaking has procedures to investigate business conduct incidents , including incidents of corruption and bribery , promptly, independently and objectively;\n\n(f)\n\nwhere applicable, whether the undertaking has in place policies with respect to animal welfare;\n\n(g)\n\nthe undertaking’s policy for training within the organisation on business conduct, including target audience, frequency and depth of coverage; and\n\n(h)\n\nthe functions within the undertaking that are most at risk in respect of corruption and bribery .\n\nUndertakings that are subject to legal requirements under national law transposing Directive (EU) 2019/1937, or to equivalent legal requirements with regard to the protection of whistle-blowers, may comply with the disclosure specified in paragraph 10 (d) by stating that they are subject to those legal requirements.\n\nDisclosure Requirement G1-2 – Management of relationships with suppliers\n\nThe undertaking shall provide information about the management of its relationships with its suppliers and its impacts on its supply chain.\n\nThe objective of this Disclosure Requirement is to provide an understanding of the undertaking’s management of its procurement process including fair behaviour with suppliers .\n\nThe undertaking shall provide a description of its policy to prevent late payments, specifically to SMEs.\n\nThe disclosure required under paragraph 12 shall include the following information:\n\n(a)\n\nthe undertaking’s approach to its relationships with its suppliers , taking account of risks to the undertaking related to its supply chain and of impacts on sustainability matters ; and\n\n(b)\n\nwhether and how it takes into account social and environmental criteria for the selection of its suppliers.\n\nDisclosure Requirement G1-3 – Prevention and detection of corruption and bribery\n\nThe undertaking shall provide information about its system to prevent and detect, investigate, and respond to allegations or incidents relating to corruption and bribery including the related training.\n\nThe objective of this Disclosure Requirement is to provide transparency on the key procedures of the undertaking to prevent, detect, and address allegations about corruption and bribery . This includes the training provided to own workers and/or information provided internally or to suppliers .\n\nThe disclosure required under paragraph 16 shall include the following information:\n\n(a)\n\na description of the procedures in place to prevent, detect, and address allegations or incidents of corruption and bribery ;\n\n(b)\n\nwhether the investigators or investigating committee are separate from the chain of management involved in the matter; and\n\n(c)\n\nthe process, if any, to report outcomes to the administrative, management and supervisory bodies .\n\nWhere the undertaking has no such procedures in place, it shall disclose this fact and, where applicable, its plans to adopt them.\n\nThe disclosures required by paragraph 16 shall include information about how the undertaking communicates its policies to those for whom they are relevant to ensure that the policy is accessible and that they understand its implications.\n\nThe disclosure required by paragraph 16 shall include information about the following with respect to training:\n\n(a)\n\nthe nature, scope and depth of anti- corruption and anti- bribery training programmes offered or required by the undertaking;\n\n(b)\n\nthe percentage of functions-at-risk covered by training programmes; and\n\n(c)\n\nthe extent to which training is given to members of the administrative, management and supervisory bodies.\n\nMetrics and targets\n\nDisclosure Requirement G1-4 – Incidents of corruption or bribery\n\nThe undertaking shall provide information on incidents of corruption or bribery during the reporting period.",
"(39)\n\n‘algorithmic trading’ means trading in financial instruments where a computer algorithm automatically determines individual parameters of orders such as whether to initiate the order, the timing, price or quantity of the order or how to manage the order after its submission, with limited or no human intervention, and does not include any system that is only used for the purpose of routing orders to one or more trading venues or for the processing of orders involving no determination of any trading parameters or for the confirmation of orders or the post-trade processing of executed transactions;\n\n(40)\n\n‘high-frequency algorithmic trading technique’ means an algorithmic trading technique characterised by:\n\n(a)"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from Alibaba-NLP/gte-modernbert-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
"How is 'associated undertaking' defined, and what criteria determine the significant influence of one undertaking over another in terms of voting rights?",
"▼B\n\n(6)\n\n‘purchase price’ means the price payable and any incidental expenses minus any incidental reductions in the cost of acquisition;\n\n(7)\n\n‘production cost’ means the purchase price of raw materials, consumables and other costs directly attributable to the item in question. Member States shall permit or require the inclusion of a reasonable proportion of fixed or variable overhead costs indirectly attributable to the item in question, to the extent that they relate to the period of production. Distribution costs shall not be included;\n\n(8)\n\n‘value adjustment’ means the adjustments intended to take account of changes in the values of individual assets established at the balance sheet date, whether the change is final or not;\n\n(9)\n\n‘parent undertaking’ means an undertaking which controls one or more subsidiary undertakings;\n\n(10)\n\n‘subsidiary undertaking’ means an undertaking controlled by a parent undertaking, including any subsidiary undertaking of an ultimate parent undertaking;\n\n(11)\n\n‘group’ means a parent undertaking and all its subsidiary undertakings;\n\n(12)\n\n‘affiliated undertakings’ means any two or more undertakings within a group;\n\n(13)\n\n‘associated undertaking’ means an undertaking in which another undertaking has a participating interest, and over whose operating and financial policies that other undertaking exercises significant influence. An undertaking is presumed to exercise a significant influence over another undertaking where it has 20 % or more of the shareholders' or members' voting rights in that other undertaking;\n\n(14)\n\n‘investment undertakings’ means:\n\n(a)\n\nundertakings the sole object of which is to invest their funds in various securities, real property and other assets, with the sole aim of spreading investment risks and giving their shareholders the benefit of the results of the management of their assets,\n\n(b)\n\nundertakings associated with investment undertakings with fixed capital, if the sole object of those associated undertakings is to acquire fully paid shares issued by those investment undertakings without prejudice to point (h) of Article 22(1) of Directive 2012/30/EU;\n\n(15)",
'and non-European non-financial corporations not subject to the disclosure obligations laid down in Directive 2013/34/EU. That information may be disclosed only once, based on counterparties’ turnover alignment for the general-purpose lending loans, as in the case of the GAR. The first disclosure reference date of this template is as of 31 December 2024. Institutions are not required to disclose this information before 1 January 2025. ---|---|---',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
InformationRetrievalEvaluator| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.691 |
| cosine_accuracy@3 | 0.9109 |
| cosine_accuracy@5 | 0.9461 |
| cosine_accuracy@10 | 0.9743 |
| cosine_precision@1 | 0.691 |
| cosine_precision@3 | 0.3036 |
| cosine_precision@5 | 0.1892 |
| cosine_precision@10 | 0.0974 |
| cosine_recall@1 | 0.691 |
| cosine_recall@3 | 0.9109 |
| cosine_recall@5 | 0.9461 |
| cosine_recall@10 | 0.9743 |
| cosine_ndcg@10 | 0.8472 |
| cosine_mrr@10 | 0.8048 |
| cosine_map@100 | 0.8061 |
sentence_0 and sentence_1| sentence_0 | sentence_1 | |
|---|---|---|
| type | string | string |
| details |
|
|
| sentence_0 | sentence_1 |
|---|---|
How is 'energy efficiency' defined in the context of Directive (EU) 2018/2001? |
of Directive (EU) 2018/2001; --- --- (8) ‘energy efficiency’ means the ratio of output of performance, service, goods or energy to input of energy; --- --- (9) ‘energy savings’ means an amount of saved energy determined by measuring or estimating consumption, or both,, before and after the implementation of an energy efficiency improvement measure, whilst ensuring normalisation for external conditions that affect energy consumption; --- --- (10) ‘energy efficiency improvement’ means an increase in energy efficiency as a result of any technological, behavioural or economic changes; --- --- (11) ‘energy service’ means the physical benefit, utility or good derived from a combination of energy with energy-efficient technology or with action, |
What are the sources of information that the external experts will use to create the list of conflict-affected and high-risk areas? |
2. |
What is the maximum time frame for completing the undertaking according to the technical specifications set out in Annexes II and III after the Directive enters into force? |
is undertaken according to the technical specifications set out in Annexes II and III and that it is completed at the latest four years after the date of entry into force of this Directive. |
MatryoshkaLoss with these parameters:{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
eval_strategy: stepsper_device_train_batch_size: 4per_device_eval_batch_size: 4num_train_epochs: 4multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 4per_device_eval_batch_size: 4per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 4max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin| Epoch | Step | Training Loss | cosine_ndcg@10 |
|---|---|---|---|
| 0.0432 | 500 | 0.358 | - |
| 0.0863 | 1000 | 0.1048 | - |
| 0.1295 | 1500 | 0.0827 | - |
| 0.1726 | 2000 | 0.067 | 0.7969 |
| 0.2158 | 2500 | 0.0491 | - |
| 0.2590 | 3000 | 0.0831 | - |
| 0.3021 | 3500 | 0.062 | - |
| 0.3453 | 4000 | 0.0657 | 0.8050 |
| 0.3884 | 4500 | 0.0522 | - |
| 0.4316 | 5000 | 0.049 | - |
| 0.4748 | 5500 | 0.0426 | - |
| 0.5179 | 6000 | 0.0708 | 0.8215 |
| 0.5611 | 6500 | 0.0236 | - |
| 0.6042 | 7000 | 0.024 | - |
| 0.6474 | 7500 | 0.0256 | - |
| 0.6905 | 8000 | 0.041 | 0.8105 |
| 0.7337 | 8500 | 0.0285 | - |
| 0.7769 | 9000 | 0.0249 | - |
| 0.8200 | 9500 | 0.0368 | - |
| 0.8632 | 10000 | 0.0588 | 0.8118 |
| 0.9063 | 10500 | 0.0386 | - |
| 0.9495 | 11000 | 0.0456 | - |
| 0.9927 | 11500 | 0.0399 | - |
| 1.0 | 11585 | - | 0.8184 |
| 1.0358 | 12000 | 0.0424 | 0.8239 |
| 1.0790 | 12500 | 0.0107 | - |
| 1.1221 | 13000 | 0.0279 | - |
| 1.1653 | 13500 | 0.0236 | - |
| 1.2085 | 14000 | 0.024 | 0.8193 |
| 1.2516 | 14500 | 0.0143 | - |
| 1.2948 | 15000 | 0.0118 | - |
| 1.3379 | 15500 | 0.0078 | - |
| 1.3811 | 16000 | 0.023 | 0.8217 |
| 1.4243 | 16500 | 0.0239 | - |
| 1.4674 | 17000 | 0.0335 | - |
| 1.5106 | 17500 | 0.0119 | - |
| 1.5537 | 18000 | 0.0411 | 0.8292 |
| 1.5969 | 18500 | 0.0168 | - |
| 1.6401 | 19000 | 0.0059 | - |
| 1.6832 | 19500 | 0.0234 | - |
| 1.7264 | 20000 | 0.0184 | 0.8366 |
| 1.7695 | 20500 | 0.0128 | - |
| 1.8127 | 21000 | 0.0166 | - |
| 1.8558 | 21500 | 0.0181 | - |
| 1.8990 | 22000 | 0.0148 | 0.8353 |
| 1.9422 | 22500 | 0.0225 | - |
| 1.9853 | 23000 | 0.0158 | - |
| 2.0 | 23170 | - | 0.8360 |
| 2.0285 | 23500 | 0.0123 | - |
| 2.0716 | 24000 | 0.0173 | 0.8329 |
| 2.1148 | 24500 | 0.0167 | - |
| 2.1580 | 25000 | 0.0125 | - |
| 2.2011 | 25500 | 0.013 | - |
| 2.2443 | 26000 | 0.0079 | 0.8338 |
| 2.2874 | 26500 | 0.007 | - |
| 2.3306 | 27000 | 0.0171 | - |
| 2.3738 | 27500 | 0.0058 | - |
| 2.4169 | 28000 | 0.0048 | 0.8405 |
| 2.4601 | 28500 | 0.005 | - |
| 2.5032 | 29000 | 0.0141 | - |
| 2.5464 | 29500 | 0.0132 | - |
| 2.5896 | 30000 | 0.006 | 0.8461 |
| 2.6327 | 30500 | 0.0095 | - |
| 2.6759 | 31000 | 0.0061 | - |
| 2.7190 | 31500 | 0.0107 | - |
| 2.7622 | 32000 | 0.0157 | 0.8451 |
| 2.8054 | 32500 | 0.005 | - |
| 2.8485 | 33000 | 0.0087 | - |
| 2.8917 | 33500 | 0.0064 | - |
| 2.9348 | 34000 | 0.005 | 0.8449 |
| 2.9780 | 34500 | 0.0115 | - |
| 3.0 | 34755 | - | 0.8451 |
| 3.0211 | 35000 | 0.0079 | - |
| 3.0643 | 35500 | 0.0045 | - |
| 3.1075 | 36000 | 0.0029 | 0.8443 |
| 3.1506 | 36500 | 0.0161 | - |
| 3.1938 | 37000 | 0.0144 | - |
| 3.2369 | 37500 | 0.0076 | - |
| 3.2801 | 38000 | 0.0157 | 0.8500 |
| 3.3233 | 38500 | 0.0039 | - |
| 3.3664 | 39000 | 0.0045 | - |
| 3.4096 | 39500 | 0.0033 | - |
| 3.4527 | 40000 | 0.0064 | 0.8434 |
| 3.4959 | 40500 | 0.0054 | - |
| 3.5391 | 41000 | 0.0061 | - |
| 3.5822 | 41500 | 0.0051 | - |
| 3.6254 | 42000 | 0.0019 | 0.8472 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
answerdotai/ModernBERT-base