Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 13
How to use geethakurup/arxiv-finetuned-v2 with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("geethakurup/arxiv-finetuned-v2")
sentences = [
"<S> the generation of hydrodynamic radiation in interactions of pulsed proton and laser beams with matter is explored . </S> <S> the beams were directed into a water target and the resulting acoustic signals were recorded with pressure sensitive sensors . </S> <S> measurements were performed with varying pulse energies , sensor positions , beam diameters and temperatures . </S> <S> the obtained data are matched by simulation results based on the thermo - acoustic model with uncertainties at a level of 10@xmath0 . </S> <S> the results imply that the primary mechanism for sound generation by the energy deposition of particles propagating in water is the local heating of the medium . </S> <S> the heating results in a fast expansion or contraction and a pressure pulse of bipolar shape is emitted into the surrounding medium . </S> <S> an interesting , widely discussed application of this effect could be the detection of ultra - high energetic cosmic neutrinos in future large - scale acoustic neutrino detectors . </S> <S> for this application a validation of the sound generation mechanism to high accuracy , as achieved with the experiments discussed in this article , is of high importance . </S> <S> cosmic neutrinos , acoustic neutrino detection , thermo - acoustic model , ultra - high energy cosmic rays , beam interaction </S>",
"in 1957 g.a . askaryan pointed out that ionisation and cavitation along a track of an ionising particle through a liquid leads to hydrodynamic radiation @xcite . in the 1960s , 1970s and 1980s , theoretical and experimental studies have been performed on the hydrodynamic radiation of beams and particles traversing dense media @xcite . the interest in characterising the properties of the acoustic radiation was , among other reasons , lead by the idea that the effect can be utilised to detect ultra - high energy ( @xmath1 ) cosmic , i.e. astrophysical neutrinos , in dense media like water , ice and salt . in the 1970s this idea was discussed within the dumand optical neutrino detector project @xcite and has been studied in connection with cherenkov neutrino detector projects since . the detection of such neutrinos is considerably more challenging than the search for high - energy neutrinos ( @xmath2 ) as currently pursued by under - ice and under - water cherenkov neutrino telescopes @xcite . due to the low expected fluxes , detector sizes exceeding 100km@xmath3 are needed @xcite . however , the properties of the acoustic method allow for sparsely instrumented arrays with @xmath4100 sensors / km@xmath3 . to study the feasibility of a detection method based on acoustic signals it is necessary to understand the properties of the sound generation by comparing measurements and simulations based on theoretical models . according to the so - called thermo - acoustic model @xcite , the energy deposition of particles traversing liquids leads to a local heating of the medium which can be regarded as instantaneous with respect to the hydrodynamic time scales . due to the temperature change the medium expands or contracts according to its bulk volume expansion coefficient @xmath5 . the accelerated motion",
"however that they are derived using the @xmath21 values determined for given @xmath62 and @xmath63 and hence do not account for the additional uncertainty introduced by allowed variations in these parameters ( which could affect the power spectrum normalization amplitude by as much as @xmath105 ) . from fig . 23(e ) , and given the uncertainties , we see that the fitting formulae of eqs . ( 8) and ( 9 ) provide an adequate summary for all the open - bubble inflation model spectra . the extreme @xmath106-@xmath80 @xmath58 normalization factor ( eq . [ 2 ] and table 8) for the flat - space scale - invariant spectrum open model ( w83 ) may be summarized by , for the lower 2-@xmath80 limit , @xmath107 , \\eqno(10)\\ ] ] and for the upper 2-@xmath80 limit , @xmath108 . \\eqno(11)\\ ] ] these fits are good to better than @xmath88 for @xmath109 ; again , they are derived from @xmath21 values determined at given @xmath62 and @xmath63 . given the uncertainties involved in the normalization procedure ( born of both statistical and other arguments ) it is not yet possible to quote a unique dmr normalization amplitude ( g96 ) . as a central \" value for the @xmath58 normalization factor , we currently advocate the mean of eqs . ( 8) and ( 9 ) or eqs . ( 10 ) and ( 11 ) as required . we emphasize , however , that it is incorrect to draw conclusions about model viability based solely on this central \" value . in conjunction with numerically determined transfer functions , the fits of eqs . ( 8)(11 ) allow for a determination of @xmath26 $ ] , accurate to a few percent . here the mean square linear",
"accordingly , the mean fluctuation value is not following a solution to the background equations of motion . evolution equations for the variance @xmath80 and skew @xmath42 are obtained after enforcing @xmath106 , yielding @xmath107 in both equations , the first term on the right - hand sides describes how @xmath81 and @xmath42 scale as the density function expands or contracts in response to the velocity field . these terms force @xmath80 and @xmath42 to scale in proportion to the velocity field . specifically , if we temporarily drop the second terms in each equation above , one finds that @xmath108 and @xmath109 . this precisely matches our expectation for the scaling of these quantities . hence , these terms account for the jacobians associated with infinitesimal transformations induced by the flow @xmath93 . for applications to inflationary non - gaussianity , the second terms in and are more relevant . these terms describe how each moment is sourced by higher moments and the interaction of the density function with the velocity field . in the example above , if we are in a situation where @xmath105 , the tails of the density function are moving faster than the core . this means that one tail is shrinking and the other is extending , skewing the probability density . the opposite occurs when @xmath110 . these effects are measured by the second term in . hence , by expanding our pdf to the third moment , and our velocity field to quadratic order , we are able to construct a set of evolution equations which include the leading - order source terms for each moment . there is little conceptually new as we move from one field to two . the new features are mostly technical in nature . our"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 384, 'do_lower_case': False, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("krishmajumdar/arxiv-finetuned-v2")
# Run inference
sentences = [
'<S> the effect of a random phase diffuser on fluctuations of laser light ( scintillations ) is studied . </S> <S> not only spatial but also temporal phase variations introduced by the phase diffuser are analyzed . </S> <S> the explicit dependence of the scintillation index on finite - time phase variations is obtained for long propagation paths . </S> <S> it is shown that for large amplitudes of phase fluctuations , a finite - time effect decreases the ability of phase diffuser to suppress the scintillations . </S>',
'operators @xmath67 ( their dependence on time is as in vacuum ) . the term for @xmath68 can be obtained from eq . [ twelve ] by putting @xmath69 . substituting both distribution functions into eq . [ eight ] , we obtain @xmath70 @xmath71 @xmath72:\\big>,\\ ] ] where @xmath73 and @xmath74 are solutions of eqs . [ twelve ] with the initial conditions @xmath63 and @xmath75 , respectively . the operators on the right side of eq . [ thirteen ] are related through matching conditions with the amplitudes of the exiting laser radiation ( see ref . @xcite ) by the relation @xmath76 where @xmath77 is the operator of the laser field which is assumed to be a single - mode field and the subscript ( @xmath78 ) means perpendicular to the @xmath28-axis component . the function @xmath79 describes the profile of the laser mode , which is assumed to be gaussian - type function [ @xmath80 . @xmath1 desribes the initial radius of the beam . to account for the effect of the phase diffuser , a factor @xmath81 or @xmath82 should be inserted into the integrand of eq . [ fourteen ] . the quantity @xmath83 is the random phase introduced by the phase diffuser . a similar consideration is applicable to each of four photon operators entering both terms in square brackets of eq . [ thirteen ] . it can be easily seen that the factor @xmath84},\\ ] ] describing the effect of phase screen on the beam , enters implicitly the integrand of eq . [ thirteen ] ( the indices @xmath78 are omitted here for the sake of brevity ) . there are integrations over variables @xmath85 as shown in eq . [ fourteen ] . furthermore , the brackets @xmath16 ,',
'that the candidate is detected with s / n @xmath136 in the unaffected image and also s / n @xmath137 in the image affected by the bad pixel . hence , we are confident that the source is real and that the photometry from the final drizzled image is robust . the sixth and final candidate is confidently detected at s / n@xmath138 in @xmath46 ( @xmath120 ) , and also in the @xmath38 with s / n = 3.7 . its photometric redshift is sharply peaked at @xmath139 , with a secondary solution at @xmath140 . this candidate is also very compact , with measured half - light radius @xmath141 , and the highest stellarity of the sample ( class_star = 0.91 ) . combining compactness with high stellarity from a high s / n source , a stellar nature ( cool dwarf ) for this source is relatively likely , as we discuss in section [ contamination ] . to translate the results on the search of possible candidates at @xmath3 from the archival borg[z8 ] data into a number density / luminosity function determination , we need to assess both the impact of contamination in our sample , and the effective volume probed by the data . there are multiple classes of lower-@xmath24 sources that may have similar @xmath103 colors to @xmath19 lyman - break galaxies ( lbgs ) , such as galactic stars , intermediate - redshift passive galaxies , and strong line emitters . cool , red stars in the milky way may be possible contaminants of our sample , although typical colors lack a strong @xmath103 drop . at low signal - to - noise ratio , the separation of point - like galactic stars from resolved galaxies using the ` sextractor ` class_star',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000, 0.5745, -0.0369],
# [ 0.5745, 1.0000, -0.0618],
# [-0.0369, -0.0618, 1.0000]])
abstract and article| abstract | article | |
|---|---|---|
| type | string | string |
| details |
|
|
| abstract | article |
|---|---|
|
additive models @xcite provide an important family of models for semiparametric regression or classification . some reasons for the success of additive models are their increased flexibility when compared to linear or generalized linear models and their increased interpretability when compared to fully nonparametric models . it is well - known that good estimators in additive models are in general less prone to the curse of high dimensionality than good estimators in fully nonparametric models . many examples of such estimators belong to the large class of regularized kernel based methods over a reproducing kernel hilbert space @xmath0 , see e.g. @xcite . in the last years many interesting results on learning rates of regularized kernel based models for additive models have been published when the focus is on sparsity and when the classical least squares loss function is used , see e.g. @xcite , @xcite , @xcite , @xcite , @xcite , @xcite and the references therein . of course , the lea... |
|
e.g. @xcite for the general case and @xcite for additive models . therefore , we will here consider the case of regularized kernel based methods based on a general convex and lipschitz continuous loss function , on a general kernel , and on the classical regularizing term @xmath1 for some @xmath2 which is a smoothness penalty but not a sparsity penalty , see e.g. @xcite . such regularized kernel based methods are now often called support vector machines ( svms ) , although the notation was historically used for such methods based on the special hinge loss function and for special kernels only , we refer to @xcite . in this paper we address the open question , whether an svm with an additive kernel can provide a substantially better learning rate in high dimensions than an svm with a general kernel , say a classical gaussian rbf kernel , if the assumption of an additive model is satisfied . our leading example covers learning rates for quantile regression based on the lipschitz continuo... |
|
approach might be to fit both models and compare their risks evaluated for test data . for the same reason we will also not cover sparsity . consistency of support vector machines generated by additive kernels for additive models was considered in @xcite . in this paper we establish learning rates for these algorithms . let us recall the framework with a complete separable metric space @xmath3 as the input space and a closed subset @xmath4 of @xmath5 as the output space . a borel probability measure @xmath6 on @xmath7 is used to model the learning problem and an independent and identically distributed sample @xmath8 is drawn according to @xmath6 for learning . a loss function @xmath9 is used to measure the quality of a prediction function @xmath10 by the local error @xmath11 . _ throughout the paper we assume that @xmath12 is measurable , @xmath13 , convex with respect to the third variable , and uniformly lipschitz continuous satisfying @xmath14 with a finite constant @xmath15 . _ sup... |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false
}
per_device_train_batch_size: 32gradient_accumulation_steps: 2warmup_ratio: 0.05save_only_model: Truefp16: Trueoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 2eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.05warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Truerestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss |
|---|---|---|
| 0.0104 | 100 | 0.8589 |
| 0.0208 | 200 | 0.5171 |
| 0.0312 | 300 | 0.4745 |
| 0.0416 | 400 | 0.4498 |
| 0.0520 | 500 | 0.4105 |
| 0.0624 | 600 | 0.394 |
| 0.0729 | 700 | 0.3896 |
| 0.0833 | 800 | 0.3788 |
| 0.0937 | 900 | 0.3561 |
| 0.1041 | 1000 | 0.3662 |
| 0.1145 | 1100 | 0.3419 |
| 0.1249 | 1200 | 0.3256 |
| 0.1353 | 1300 | 0.3337 |
| 0.1457 | 1400 | 0.335 |
| 0.1561 | 1500 | 0.3255 |
| 0.1665 | 1600 | 0.3099 |
| 0.1769 | 1700 | 0.3092 |
| 0.1873 | 1800 | 0.2985 |
| 0.1978 | 1900 | 0.2931 |
| 0.2082 | 2000 | 0.2977 |
| 0.2186 | 2100 | 0.2918 |
| 0.2290 | 2200 | 0.2856 |
| 0.2394 | 2300 | 0.2835 |
| 0.2498 | 2400 | 0.2689 |
| 0.2602 | 2500 | 0.2743 |
| 0.2706 | 2600 | 0.2504 |
| 0.2810 | 2700 | 0.2423 |
| 0.2914 | 2800 | 0.2717 |
| 0.3018 | 2900 | 0.2653 |
| 0.3122 | 3000 | 0.2543 |
| 0.3226 | 3100 | 0.256 |
| 0.3331 | 3200 | 0.2555 |
| 0.3435 | 3300 | 0.2485 |
| 0.3539 | 3400 | 0.243 |
| 0.3643 | 3500 | 0.2339 |
| 0.3747 | 3600 | 0.2447 |
| 0.3851 | 3700 | 0.2311 |
| 0.3955 | 3800 | 0.2245 |
| 0.4059 | 3900 | 0.2276 |
| 0.4163 | 4000 | 0.2243 |
| 0.4267 | 4100 | 0.2225 |
| 0.4371 | 4200 | 0.2391 |
| 0.4475 | 4300 | 0.2162 |
| 0.4580 | 4400 | 0.2194 |
| 0.4684 | 4500 | 0.2291 |
| 0.4788 | 4600 | 0.2307 |
| 0.4892 | 4700 | 0.2141 |
| 0.4996 | 4800 | 0.2124 |
| 0.5100 | 4900 | 0.2306 |
| 0.5204 | 5000 | 0.2075 |
| 0.5308 | 5100 | 0.2055 |
| 0.5412 | 5200 | 0.2294 |
| 0.5516 | 5300 | 0.2165 |
| 0.5620 | 5400 | 0.2165 |
| 0.5724 | 5500 | 0.1957 |
| 0.5828 | 5600 | 0.1971 |
| 0.5933 | 5700 | 0.1935 |
| 0.6037 | 5800 | 0.2077 |
| 0.6141 | 5900 | 0.1931 |
| 0.6245 | 6000 | 0.1987 |
| 0.6349 | 6100 | 0.1983 |
| 0.6453 | 6200 | 0.1889 |
| 0.6557 | 6300 | 0.1894 |
| 0.6661 | 6400 | 0.195 |
| 0.6765 | 6500 | 0.1936 |
| 0.6869 | 6600 | 0.1811 |
| 0.6973 | 6700 | 0.1835 |
| 0.7077 | 6800 | 0.2028 |
| 0.7182 | 6900 | 0.1904 |
| 0.7286 | 7000 | 0.1853 |
| 0.7390 | 7100 | 0.1646 |
| 0.7494 | 7200 | 0.1904 |
| 0.7598 | 7300 | 0.181 |
| 0.7702 | 7400 | 0.176 |
| 0.7806 | 7500 | 0.1746 |
| 0.7910 | 7600 | 0.1846 |
| 0.8014 | 7700 | 0.1706 |
| 0.8118 | 7800 | 0.1692 |
| 0.8222 | 7900 | 0.1696 |
| 0.8326 | 8000 | 0.171 |
| 0.0104 | 100 | 0.2682 |
| 0.0208 | 200 | 0.1698 |
| 0.0312 | 300 | 0.1492 |
| 0.0416 | 400 | 0.1597 |
| 0.0520 | 500 | 0.1421 |
| 0.0624 | 600 | 0.1412 |
| 0.0729 | 700 | 0.1367 |
| 0.0833 | 800 | 0.1407 |
| 0.0937 | 900 | 0.1276 |
| 0.1041 | 1000 | 0.1352 |
| 0.1145 | 1100 | 0.1307 |
| 0.1249 | 1200 | 0.1188 |
| 0.1353 | 1300 | 0.1211 |
| 0.1457 | 1400 | 0.1203 |
| 0.1561 | 1500 | 0.1131 |
| 0.1665 | 1600 | 0.1077 |
| 0.1769 | 1700 | 0.1061 |
| 0.1873 | 1800 | 0.1064 |
| 0.1978 | 1900 | 0.1016 |
| 0.2082 | 2000 | 0.1066 |
| 0.2186 | 2100 | 0.1077 |
| 0.2290 | 2200 | 0.1009 |
| 0.2394 | 2300 | 0.1048 |
| 0.2498 | 2400 | 0.0925 |
| 0.2602 | 2500 | 0.1054 |
| 0.2706 | 2600 | 0.0873 |
| 0.2810 | 2700 | 0.082 |
| 0.2914 | 2800 | 0.0976 |
| 0.3018 | 2900 | 0.097 |
| 0.3122 | 3000 | 0.0876 |
| 0.3226 | 3100 | 0.0959 |
| 0.3331 | 3200 | 0.0931 |
| 0.3435 | 3300 | 0.0903 |
| 0.3539 | 3400 | 0.0854 |
| 0.3643 | 3500 | 0.0841 |
| 0.3747 | 3600 | 0.0914 |
| 0.3851 | 3700 | 0.0809 |
| 0.3955 | 3800 | 0.0798 |
| 0.4059 | 3900 | 0.0847 |
| 0.4163 | 4000 | 0.0784 |
| 0.4267 | 4100 | 0.0837 |
| 0.4371 | 4200 | 0.092 |
| 0.4475 | 4300 | 0.0794 |
| 0.4580 | 4400 | 0.0811 |
| 0.4684 | 4500 | 0.0844 |
| 0.4788 | 4600 | 0.092 |
| 0.4892 | 4700 | 0.0743 |
| 0.4996 | 4800 | 0.0839 |
| 0.5100 | 4900 | 0.0939 |
| 0.5204 | 5000 | 0.0789 |
| 0.5308 | 5100 | 0.0769 |
| 0.5412 | 5200 | 0.0936 |
| 0.5516 | 5300 | 0.085 |
| 0.5620 | 5400 | 0.0857 |
| 0.5724 | 5500 | 0.0731 |
| 0.5828 | 5600 | 0.0766 |
| 0.5933 | 5700 | 0.078 |
| 0.6037 | 5800 | 0.0812 |
| 0.6141 | 5900 | 0.0731 |
| 0.6245 | 6000 | 0.0783 |
| 0.6349 | 6100 | 0.075 |
| 0.6453 | 6200 | 0.0734 |
| 0.6557 | 6300 | 0.0725 |
| 0.6661 | 6400 | 0.0796 |
| 0.6765 | 6500 | 0.0748 |
| 0.6869 | 6600 | 0.0722 |
| 0.6973 | 6700 | 0.0705 |
| 0.7077 | 6800 | 0.0831 |
| 0.7182 | 6900 | 0.0787 |
| 0.7286 | 7000 | 0.0779 |
| 0.7390 | 7100 | 0.0641 |
| 0.7494 | 7200 | 0.0795 |
| 0.7598 | 7300 | 0.0712 |
| 0.7702 | 7400 | 0.0698 |
| 0.7806 | 7500 | 0.068 |
| 0.7910 | 7600 | 0.0729 |
| 0.8014 | 7700 | 0.0693 |
| 0.8118 | 7800 | 0.0719 |
| 0.8222 | 7900 | 0.0735 |
| 0.8326 | 8000 | 0.073 |
| 0.8430 | 8100 | 0.1425 |
| 0.8535 | 8200 | 0.1422 |
| 0.8639 | 8300 | 0.1336 |
| 0.8743 | 8400 | 0.1448 |
| 0.8847 | 8500 | 0.1421 |
| 0.8951 | 8600 | 0.143 |
| 0.9055 | 8700 | 0.1299 |
| 0.9159 | 8800 | 0.1337 |
| 0.9263 | 8900 | 0.138 |
| 0.9367 | 9000 | 0.1417 |
| 0.9471 | 9100 | 0.1266 |
| 0.9575 | 9200 | 0.1187 |
| 0.9679 | 9300 | 0.1454 |
| 0.9784 | 9400 | 0.1322 |
| 0.9888 | 9500 | 0.137 |
| 0.9992 | 9600 | 0.1452 |
| 1.0096 | 9700 | 0.0936 |
| 1.0200 | 9800 | 0.0986 |
| 1.0304 | 9900 | 0.1021 |
| 1.0408 | 10000 | 0.1004 |
| 1.0512 | 10100 | 0.0954 |
| 1.0616 | 10200 | 0.1004 |
| 1.0720 | 10300 | 0.0974 |
| 1.0824 | 10400 | 0.0939 |
| 1.0928 | 10500 | 0.1039 |
| 1.1032 | 10600 | 0.111 |
| 1.1137 | 10700 | 0.0993 |
| 1.1241 | 10800 | 0.0975 |
| 1.1345 | 10900 | 0.0939 |
| 1.1449 | 11000 | 0.1042 |
| 1.1553 | 11100 | 0.0984 |
| 1.1657 | 11200 | 0.1008 |
| 1.1761 | 11300 | 0.0977 |
| 1.1865 | 11400 | 0.0881 |
| 1.1969 | 11500 | 0.0971 |
| 1.2073 | 11600 | 0.0909 |
| 1.2177 | 11700 | 0.0938 |
| 1.2281 | 11800 | 0.0933 |
| 1.2386 | 11900 | 0.1035 |
| 1.2490 | 12000 | 0.0931 |
| 1.2594 | 12100 | 0.1053 |
| 1.2698 | 12200 | 0.1043 |
| 1.2802 | 12300 | 0.0935 |
| 1.2906 | 12400 | 0.0928 |
| 1.3010 | 12500 | 0.0969 |
| 1.3114 | 12600 | 0.0901 |
| 1.3218 | 12700 | 0.0992 |
| 1.3322 | 12800 | 0.0978 |
| 1.3426 | 12900 | 0.0901 |
| 1.3530 | 13000 | 0.0835 |
| 1.3634 | 13100 | 0.0914 |
| 1.3739 | 13200 | 0.0922 |
| 1.3843 | 13300 | 0.0923 |
| 1.3947 | 13400 | 0.0917 |
| 1.4051 | 13500 | 0.089 |
| 1.4155 | 13600 | 0.0903 |
| 1.4259 | 13700 | 0.0913 |
| 1.4363 | 13800 | 0.093 |
| 1.4467 | 13900 | 0.0909 |
| 1.4571 | 14000 | 0.0906 |
| 1.4675 | 14100 | 0.0903 |
| 1.4779 | 14200 | 0.0946 |
| 1.4883 | 14300 | 0.0933 |
| 1.4988 | 14400 | 0.0898 |
| 1.5092 | 14500 | 0.088 |
| 1.5196 | 14600 | 0.0961 |
| 1.5300 | 14700 | 0.0887 |
| 1.5404 | 14800 | 0.0858 |
| 1.5508 | 14900 | 0.0878 |
| 1.5612 | 15000 | 0.092 |
| 1.5716 | 15100 | 0.0857 |
| 1.5820 | 15200 | 0.0878 |
| 1.5924 | 15300 | 0.0856 |
| 1.6028 | 15400 | 0.0887 |
| 1.6132 | 15500 | 0.0837 |
| 1.6236 | 15600 | 0.0832 |
| 1.6341 | 15700 | 0.083 |
| 1.6445 | 15800 | 0.0906 |
| 1.6549 | 15900 | 0.0844 |
| 1.6653 | 16000 | 0.085 |
| 1.6757 | 16100 | 0.0837 |
| 1.6861 | 16200 | 0.0826 |
| 1.6965 | 16300 | 0.0867 |
| 1.7069 | 16400 | 0.0902 |
| 1.7173 | 16500 | 0.0864 |
| 1.7277 | 16600 | 0.0882 |
| 1.7381 | 16700 | 0.0894 |
| 1.7485 | 16800 | 0.0902 |
| 1.7590 | 16900 | 0.0813 |
| 1.7694 | 17000 | 0.0821 |
| 1.7798 | 17100 | 0.0863 |
| 1.7902 | 17200 | 0.0828 |
| 1.8006 | 17300 | 0.0902 |
| 1.8110 | 17400 | 0.0831 |
| 1.8214 | 17500 | 0.0765 |
| 1.8318 | 17600 | 0.0806 |
| 1.8422 | 17700 | 0.0793 |
| 1.8526 | 17800 | 0.0842 |
| 1.8630 | 17900 | 0.0828 |
| 1.8734 | 18000 | 0.085 |
| 1.8838 | 18100 | 0.0803 |
| 1.8943 | 18200 | 0.0772 |
| 1.9047 | 18300 | 0.0865 |
| 1.9151 | 18400 | 0.0847 |
| 1.9255 | 18500 | 0.0835 |
| 1.9359 | 18600 | 0.0818 |
| 1.9463 | 18700 | 0.0757 |
| 1.9567 | 18800 | 0.0772 |
| 1.9671 | 18900 | 0.0854 |
| 1.9775 | 19000 | 0.0813 |
| 1.9879 | 19100 | 0.0844 |
| 1.9983 | 19200 | 0.0793 |
| 2.0087 | 19300 | 0.0668 |
| 2.0192 | 19400 | 0.0647 |
| 2.0296 | 19500 | 0.0702 |
| 2.0400 | 19600 | 0.0703 |
| 2.0504 | 19700 | 0.0641 |
| 2.0608 | 19800 | 0.0768 |
| 2.0712 | 19900 | 0.0632 |
| 2.0816 | 20000 | 0.0633 |
| 2.0920 | 20100 | 0.0608 |
| 2.1024 | 20200 | 0.0684 |
| 2.1128 | 20300 | 0.0618 |
| 2.1232 | 20400 | 0.063 |
| 2.1336 | 20500 | 0.0625 |
| 2.1440 | 20600 | 0.0631 |
| 2.1545 | 20700 | 0.0681 |
| 2.1649 | 20800 | 0.0584 |
| 2.1753 | 20900 | 0.0655 |
| 2.1857 | 21000 | 0.0651 |
| 2.1961 | 21100 | 0.0699 |
| 2.2065 | 21200 | 0.0704 |
| 2.2169 | 21300 | 0.0686 |
| 2.2273 | 21400 | 0.0655 |
| 2.2377 | 21500 | 0.063 |
| 2.2481 | 21600 | 0.0657 |
| 2.2585 | 21700 | 0.0694 |
| 2.2689 | 21800 | 0.066 |
| 2.2794 | 21900 | 0.0677 |
| 2.2898 | 22000 | 0.0617 |
| 2.3002 | 22100 | 0.0612 |
| 2.3106 | 22200 | 0.06 |
| 2.3210 | 22300 | 0.0572 |
| 2.3314 | 22400 | 0.0642 |
| 2.3418 | 22500 | 0.0601 |
| 2.3522 | 22600 | 0.0581 |
| 2.3626 | 22700 | 0.0702 |
| 2.3730 | 22800 | 0.0614 |
| 2.3834 | 22900 | 0.0631 |
| 2.3938 | 23000 | 0.0586 |
| 2.4042 | 23100 | 0.0638 |
| 2.4147 | 23200 | 0.0584 |
| 2.4251 | 23300 | 0.068 |
| 2.4355 | 23400 | 0.0681 |
| 2.4459 | 23500 | 0.0616 |
| 2.4563 | 23600 | 0.0604 |
| 2.4667 | 23700 | 0.0618 |
| 2.4771 | 23800 | 0.0603 |
| 2.4875 | 23900 | 0.0643 |
| 2.4979 | 24000 | 0.0639 |
| 2.5083 | 24100 | 0.0656 |
| 2.5187 | 24200 | 0.0578 |
| 2.5291 | 24300 | 0.0613 |
| 2.5396 | 24400 | 0.061 |
| 2.5500 | 24500 | 0.0578 |
| 2.5604 | 24600 | 0.059 |
| 2.5708 | 24700 | 0.0586 |
| 2.5812 | 24800 | 0.0532 |
| 2.5916 | 24900 | 0.0547 |
| 2.6020 | 25000 | 0.0596 |
| 2.6124 | 25100 | 0.0614 |
| 2.6228 | 25200 | 0.0547 |
| 2.6332 | 25300 | 0.056 |
| 2.6436 | 25400 | 0.0578 |
| 2.6540 | 25500 | 0.0611 |
| 2.6644 | 25600 | 0.0605 |
| 2.6749 | 25700 | 0.062 |
| 2.6853 | 25800 | 0.0601 |
| 2.6957 | 25900 | 0.0618 |
| 2.7061 | 26000 | 0.055 |
| 2.7165 | 26100 | 0.0614 |
| 2.7269 | 26200 | 0.0553 |
| 2.7373 | 26300 | 0.0587 |
| 2.7477 | 26400 | 0.0629 |
| 2.7581 | 26500 | 0.0559 |
| 2.7685 | 26600 | 0.0559 |
| 2.7789 | 26700 | 0.0533 |
| 2.7893 | 26800 | 0.0591 |
| 2.7998 | 26900 | 0.0526 |
| 2.8102 | 27000 | 0.0548 |
| 2.8206 | 27100 | 0.0562 |
| 2.8310 | 27200 | 0.0577 |
| 2.8414 | 27300 | 0.0611 |
| 2.8518 | 27400 | 0.0565 |
| 2.8622 | 27500 | 0.0627 |
| 2.8726 | 27600 | 0.0604 |
| 2.8830 | 27700 | 0.0578 |
| 2.8934 | 27800 | 0.0564 |
| 2.9038 | 27900 | 0.0591 |
| 2.9142 | 28000 | 0.0566 |
| 2.9246 | 28100 | 0.0541 |
| 2.9351 | 28200 | 0.0544 |
| 2.9455 | 28300 | 0.0598 |
| 2.9559 | 28400 | 0.0592 |
| 2.9663 | 28500 | 0.0559 |
| 2.9767 | 28600 | 0.0578 |
| 2.9871 | 28700 | 0.055 |
| 2.9975 | 28800 | 0.0509 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
sentence-transformers/all-MiniLM-L6-v2