You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

SentenceTransformer

This is a sentence-transformers model trained on the parquet dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Supported Modality: Text
  • Training Dataset:
    • parquet

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'ModernBertModel'})
  (1): Pooling({'embedding_dimension': 768, 'pooling_mode': 'cls', 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("TeraflopAI/teraflopai-denseon-caselaw")
# Run inference
queries = [
    "Under what circumstances can a plaintiff recover damages for lost profits when a defendant's breach of contract involves the failure to provide telephone service?",
]
documents = [
    'UNITED STATES COURT OF APPEALS\n                            UNITED STATES COURT OF APPEALS\n                    FOR THE FIRST CIRCUIT\n                                FOR THE FIRST CIRCUIT\n                                         \nNo. 94-1711\n\n                  SAS OF PUERTO RICO, INC.,\n\n                    Plaintiff, Appellant,\n\n                              v.\n\n                PUERTO RICO TELEPHONE COMPANY,\n\n                     Defendant, Appellee. \n\n                                         \n\n         APPEAL FROM THE UNITED STATES DISTRICT COURT\n\n               FOR THE DISTRICT OF PUERTO RICO\n\n        [Hon. Jose Antonio Fuste, U.S. District Judge]\n                                                                 \n\n                                         \n\n                            Before\n\n                    Torruella, Chief Judge,\n                                                      \n\n                    Boudin, Circuit Judge,\n                                                     \n\n              and Boyle,* Senior District Judge.\n                                                           \n\n                                         \n\nLaurence  Z.  Shiekman  with  whom  M.  Duncan  Grant,  Frank   M.\n                                                                              \nRapoport,  Michael A. Ceramella and Pepper, Hamilton & Scheetz were on\n                                                                      \nbrief for appellant.\nPhilip J. Mause with whom Joaquin A. Marquez and Drinker Biddle  &\n                                                                              \nReath were on brief for appellee.\n             \n\n                                         \n\n                      February 21, 1995\n                                         \n\n                \n\n*Of the District of Rhode Island, sitting by designation.',
    'Nor did the BIA abuse its discretion by denying Chen\'s motion to reopen, which alleged that he suffered from the ineffective assistance of counsel. To prevail on such a claim, the alien must first comply with certain procedures set forth in Matter of Lozada, 19 I. & N. Dec. 637 (BIA 1988). Here, the BIA properly noted that besides filing a supporting affidavit, Chen made no effort to comply with the requirements enumerated in Lozada. Chen not only failed to notify his former counsel of the allegations of ineffective assistance and to allow him an opportunity to respond, he also failed to file a complaint with a disciplinary authority or provide an explanation for not doing so. See Twum v. INS, 411 F.3d 54, 59 (2d Cir.2005) (citing Lozada, 19 I. & N. Dec. at 639). \n\nBy failing to substantially comply with Lozada, Chen "forfeit[ed][his] ineffective assistance of counsel claim." Jian Yun Zheng v. U.S. Dep\'t of Justice, 409 F.3d 43, 47 (2d Cir.2005). While it is true that "slavish adherence" to Lozada\'s requirements is not necessary in certain circumstances, and while the BIA acknowledged in its June 2006 decision that the brief written by Chen\'s former counsel was deficient, this is not a case in which the facts supporting a "claim of ineffective assistance are clear on the face of the record," which may excuse the failure to comply with Lozada. Yi Long Yang, 478 F.3d at 142-43. The facts here are distinct from the circumstances presented in Yi Long Yang, in that Chen\'s former counsel was not disbarred, nor was there evidence that the agency explicitly assumed his competence. See id. at 142.',
    'We believe the same prerequisite should operate in this\ncase. The requirement that parties seeking Rule 60(b) relief\nshow some prospect of succeeding on the merits flows from\nthe basic principle that courts should revive previously-\ndismissed claims only if they have some reason to believe that\ndoing so will not ultimately waste judicial resources. See\nMurray,\n52 F.3d at 355\n. This principle holds true here:\nreviving Thomas\'s appeal will constitute an "empty exercise\nor futile gesture,"\nid.,\n unless Thomas has some possibility of\nprevailing. \n\n       Indeed, we see two especially good reasons to condition\nthe grant of Thomas\'s motion for reconsideration on his\ndemonstrating a chance of succeeding on the merits. First,\nThomas claims that his appeal should be reinstated because\nthe PLRA\'s three-strikes provision is unconstitutional as\napplied to him. For this court to reach out and decide this\ndifficult and important question simply to reinstate a pointless\nappeal would violate the norm of constitutional avoidance to\nwhich we generally adhere. See Kalka v. Hawk,\n215 F.3d 90, 97\n (D.C. Cir. 2000) ("Federal courts should not decide\nconstitutional questions unless it is necessary to do so.").\nSecond, the PLRA provides that a court "shall dismiss" an\nIFP litigant\'s case if the "appeal . . . is frivolous or malicious\n. . . [or] fails to state a claim on which relief may be granted."\n28 U.S.C. § 1915\n(e)(2). Thus, even were we to grant Thomas\nIFP status and reinstate his appeal, we would then have to\n\x0c                                7\npromptly dismiss the case if his claims lack merit. What could\nbe a more "futile gesture" than reinstating an appeal only to\nthen immediately dismiss it?',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.6819, -0.1021, -0.0287]])

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.8934
cosine_accuracy@3 0.957
cosine_accuracy@5 0.9722
cosine_accuracy@10 0.9854
cosine_precision@1 0.8934
cosine_precision@3 0.319
cosine_precision@5 0.1944
cosine_precision@10 0.0985
cosine_recall@1 0.8934
cosine_recall@3 0.957
cosine_recall@5 0.9722
cosine_recall@10 0.9854
cosine_ndcg@10 0.9419
cosine_mrr@10 0.9277
cosine_map@100 0.9284

Training Details

Training Dataset

parquet

  • Dataset: parquet

  • Size: 36,118,859 training samples

  • Columns: question and answer

  • Approximate statistics based on the first 1000 samples:

    question answer
    type string string
    details
    • min: 16 tokens
    • mean: 29.64 tokens
    • max: 52 tokens
    • min: 41 tokens
    • mean: 306.99 tokens
    • max: 512 tokens
  • Samples:

    question answer
    What is the legal standard and procedure for granting a defendant's request for leave to withdraw as counsel when the attorney certifies that no nonfrivolous issues exist for appeal? Appeal by the defendant from two judgments of the Supreme Court, Queens County (Rosengarten, J.), both rendered May 5, 2003, convicting him of burglary in the first degree, robbery in the first degree, and burglary in the second degree under indictment No. 3417/01, and burglary in the first degree, robbery in the first degree, and burglary in the second degree under indictment No. 1182/02, upon his pleas of guilty, and imposing sentences.

    Ordered that the judgments are affirmed.

    We have reviewed the record and agree with the defendant's assigned counsel that there are no nonfrivolous issues which could be raised on appeal. Counsel's application for leave to withdraw as counsel is granted (see Anders v California, 386 US 738 [1967]; People v Paige, 54 AD2d 631 [1976]; cf. People v *606Gonzalez, 47 NY2d 606 [1979]). Adams, J.P., Cozier, Ritter and Skelos, JJ., concur.
    Are state-law tort claims alleging defective labeling of generic drugs preempted by federal law? ORDER

    JOSEPH N. LAPLANTE, District Judge.

    This case presents a question currently pending before three different federal courts of appeal: whether state-law tort claims alleging the defective labeling of generic drugs are preempted by federal law. See Morris v. Wyeth, Inc., No. 09-5509 (6th Cir. Apr. 27, 2009); Demahy v. Wyeth, Inc., No. 08-31204 (5th Cir. Dec. 16, 2008); Mensing v. Wyeth, Inc., No. 08-3850 (8th Cir. Dec. 10, 2008). The defendants, Mutual Pharmaceutical Company, Inc. and United Research Laboratories, Inc., move for judgment on the pleadings, see Fed.R.Civ.P. 12(c), on claims by the plaintiffs, Karen L. and Gregory S. Bartlett, alleging that Karen suffered serious injuries from Sulindac, a generic drug manufactured by the defendants. The defendants argue that all of the plaintiffs' state-law causes of action are pre-empted by Title I of the Drug Price Competition and Patent Term Restoration Act of 1984, 1 part of the Hatch-Waxman Amendments to the Federal Food, Drug,... | | Under what circumstances can a trial court's finding of competency to stand trial be challenged when multiple expert evaluations conclude the defendant is competent but exhibit bizarre conduct? | Prior to trial, Card was examined by two court-appointed psychologists for the purpose of determining whether he was competent to stand trial. Following examinations, both psychologists concluded that Card was competent to stand trial pursuant to the criteria set forth in Rule 3.211, Florida Rule of Criminal Procedure. After the initial reports of the two court-appointed *1175 experts were filed, the defense filed a motion for the appointment of a forensic psychiatrist to examine Card. The court acquiesced to this request. Although the forensic psychiatrist did not file his report with the court until a few months after the court issued its order finding Card competent to stand trial, the forensic psychiatrist also concluded that Card was competent. Further, although the various reports filed by the experts indicate bizarre conduct and behavioral problems, the trial court was never presented with evidence providing reasonable grounds to believe that Card was not competent to stand tria... |

  • Loss: MatryoshkaLoss with these parameters:

    {
        "loss": "CachedMultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Evaluation Dataset

parquet

  • Dataset: parquet
  • Size: 10,000 evaluation samples
  • Columns: question and answer
  • Approximate statistics based on the first 1000 samples:
    question answer
    type string string
    details
    • min: 15 tokens
    • mean: 29.87 tokens
    • max: 55 tokens
    • min: 74 tokens
    • mean: 311.66 tokens
    • max: 512 tokens
  • Samples:
    question answer
    What specific factors do Texas courts consider when determining if terminating a parent's rights serves the child's best interest? In determining whether termination is in the child's best interest, we apply the following factors laid out in Holley v. Adams, 544 S.W.2d 367, 371–72 (Tex. 1976). Those factors include, but are not limited to:

              1.       The child's desires;

     

              2.       The child's physical and emotional needs, now and in the future;

     

              3.       The emotional and physical danger to the child, now and in the future;

     

              4.       The parental ability of the individuals seeking custody;

     

              5.       The programs available to assist these individuals in promoting the child's best interest;

     

              6.       The plans for the child by the individual or agency seeking custody;

     

              7.       The stability of the home or proposed placement;

     

              8.       The parent's act or omissions that may indicate the existing parent-child relationship is not the proper one; and

     

              9.       Any excuse for the parent's acts or omissions.
    Under what circumstances do separate criminal acts fail to constitute a single continuous transaction for the purpose of admitting evidence of one act to prove another? ¶37 The case before us is far more analogous to Hildreth than to the others. Gallegos stabbed Victim in a park and was later apprehended. Then, while at the police station, Gallegos acted violently, resulting in additional charges. Gallegos's violent behavior at the police station did not "facilitate[ ] flight" from the earlier attack, nor could the later crimes be characterized as "a single [violent] spree," as we would characterize a string of robberies, for example. See Benson , 2014 UT App 92 , ¶¶ 13-14, 325 P.3d 855 . Neither do Gallegos's crimes demonstrate "a distinct behavioral arc of increasingly aggressive and opportunistic transgressions." Burke , 2011 UT App 168 , ¶ 24, 256 P.3d 1102 . Instead, this case is more like Hildreth , where the defendant committed a sequence of offenses, but those offenses were not otherwise related to each other. See 2010 UT App 209 , ¶ 32, 238 P.3d 444 . Here, the stabbing at the park and the violent behavior at the police station are so indepen...
    What level of culpability, such as actual knowledge or reckless disregard, must a plaintiff prove to establish an Eighth Amendment violation for deliberate indifference? Wilson v. Seiter, ___ U.S. at -, ___, 111 S.Ct. at 2324-25, 2327.
    The Seventh Circuit recently observed that "[i]n order to show `deliberate indifference,' a plaintiff is required to prove that the prison official's action was deliberate or reckless in the criminal sense." Santiago v. Lane, 894 F.2d 218 (7th Cir.1990) (emphasis added) (footnote omitted). The United States Supreme Court has cited the Seventh Circuit's criminal recklessness standard with approval. Whitley v. Albers, 475 U.S. 312, 321, 106 S.Ct. 1078, 1085, 89 L.Ed.2d 251 (1986), citing Duckworth v. Franzen, 780 F.2d 645, 653 (7th Cir.1985), cert. denied, 479 U.S. 816, 107 S.Ct. 71, 93 L.Ed.2d 28 (1986). In Franzen, the Seventh Circuit noted that punishment under the Eighth Amendment "implies at a minimum actual knowledge of impending harm easily preventable, so that a conscious, culpable refusal to prevent the harm can be inferred from the defendant's failure to prevent it." 780 F.2d at 653. See also Wilks v. You... |
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "CachedMultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 2048
  • num_train_epochs: 1
  • learning_rate: 6e-05
  • lr_scheduler_type: cosine
  • warmup_steps: 0.1
  • bf16: True
  • per_device_eval_batch_size: 512
  • prompts: {'question': 'query: ', 'answer': 'document: '}
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • per_device_train_batch_size: 2048
  • num_train_epochs: 1
  • max_steps: -1
  • learning_rate: 6e-05
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: None
  • warmup_steps: 0.1
  • optim: adamw_torch_fused
  • optim_args: None
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • optim_target_modules: None
  • gradient_accumulation_steps: 1
  • average_tokens_across_devices: True
  • max_grad_norm: 1.0
  • label_smoothing_factor: 0.0
  • bf16: True
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • use_liger_kernel: False
  • liger_kernel_config: None
  • use_cache: False
  • neftune_noise_alpha: None
  • torch_empty_cache_steps: None
  • auto_find_batch_size: False
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • include_num_input_tokens_seen: no
  • log_level: passive
  • log_level_replica: warning
  • disable_tqdm: False
  • project: huggingface
  • trackio_space_id: None
  • trackio_bucket_id: None
  • trackio_static_space_id: None
  • per_device_eval_batch_size: 512
  • prediction_loss_only: True
  • eval_on_start: False
  • eval_do_concat_batches: True
  • eval_use_gather_object: False
  • eval_accumulation_steps: None
  • include_for_metrics: []
  • batch_eval_metrics: False
  • save_only_model: False
  • save_on_each_node: False
  • enable_jit_checkpoint: False
  • push_to_hub: False
  • hub_private_repo: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_always_push: False
  • hub_revision: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • restore_callback_states_from_checkpoint: False
  • full_determinism: False
  • seed: 42
  • data_seed: None
  • use_cpu: False
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • dataloader_prefetch_factor: None
  • remove_unused_columns: True
  • label_names: None
  • train_sampling_strategy: random
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • ddp_static_graph: None
  • ddp_backend: None
  • ddp_timeout: 1800
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • deepspeed: None
  • debug: []
  • skip_memory_metrics: True
  • do_predict: False
  • resume_from_checkpoint: None
  • warmup_ratio: None
  • local_rank: -1
  • prompts: {'question': 'query: ', 'answer': 'document: '}
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss DenseOn_lr6e-05_warmup0.1_bs8k-caselaw_cosine_ndcg@10
0.7008 3090 0.3007 - -
0.7031 3100 0.2991 - -
0.7054 3110 0.3052 - -
0.7076 3120 0.2941 - -
0.7099 3130 0.2964 - -
0.7122 3140 0.2958 - -
0.7144 3150 0.2966 - -
0.7167 3160 0.2959 - -
0.7190 3170 0.2954 - -
0.7213 3180 0.2982 - -
0.7235 3190 0.2980 - -
0.7258 3200 0.2955 - -
0.7281 3210 0.2958 - -
0.7303 3220 0.2988 - -
0.7326 3230 0.2892 - -
0.7349 3240 0.2973 - -
0.7371 3250 0.2960 - -
0.7394 3260 0.3030 - -
0.7417 3270 0.2998 - -
0.7439 3280 0.3021 - -
0.7462 3290 0.3046 - -
0.7485 3300 0.2922 - -
0.7507 3310 0.2941 - -
0.7530 3320 0.2953 - -
0.7553 3330 0.2992 - -
0.7575 3340 0.3015 - -
0.7598 3350 0.2951 - -
0.7621 3360 0.3021 - -
0.7643 3370 0.3022 - -
0.7666 3380 0.2923 - -
0.7689 3390 0.2946 - -
0.7711 3400 0.2986 - -
0.7734 3410 0.2960 - -
0.7757 3420 0.3006 - -
0.7780 3430 0.3020 - -
0.7802 3440 0.2894 - -
0.7825 3450 0.2986 - -
0.7848 3460 0.2912 - -
0.7870 3470 0.2957 - -
0.7893 3480 0.2954 - -
0.7916 3490 0.2937 - -
0.7938 3500 0.2989 - -
0.7961 3510 0.2956 - -
0.7984 3520 0.3020 - -
0.8006 3530 0.2957 - -
0.8029 3540 0.2873 - -
0.8052 3550 0.2900 - -
0.8074 3560 0.2885 - -
0.8097 3570 0.2904 - -
0.8120 3580 0.2857 - -
0.8142 3590 0.2977 - -
0.8165 3600 0.2891 - -
0.8188 3610 0.2958 - -
0.8210 3620 0.2985 - -
0.8233 3630 0.2915 - -
0.8256 3640 0.2910 - -
0.8279 3650 0.2931 - -
0.8301 3660 0.2983 - -
0.8324 3670 0.2921 - -
0.8347 3680 0.2804 - -
0.8369 3690 0.3018 - -
0.8392 3700 0.2920 - -
0.8415 3710 0.2897 - -
0.8437 3720 0.2896 - -
0.8460 3730 0.2884 - -
0.8483 3740 0.2919 - -
0.8505 3750 0.2896 - -
0.8528 3760 0.2971 - -
0.8551 3770 0.2948 - -
0.8573 3780 0.2869 - -
0.8596 3790 0.2976 - -
0.8619 3800 0.2924 - -
0.8641 3810 0.2907 - -
0.8664 3820 0.2973 - -
0.8687 3830 0.2985 - -
0.8709 3840 0.2909 - -
0.8732 3850 0.2951 - -
0.8755 3860 0.2851 - -
0.8778 3870 0.2867 - -
0.8800 3880 0.2950 - -
0.8823 3890 0.2919 - -
0.8846 3900 0.2978 - -
0.8868 3910 0.2902 - -
0.8891 3920 0.2953 - -
0.8914 3930 0.2938 - -
0.8936 3940 0.2922 - -
0.8959 3950 0.2884 - -
0.8982 3960 0.2881 - -
0.9002 3969 - 0.0828 0.9418
0.9004 3970 0.2967 - -
0.9027 3980 0.2941 - -
0.9050 3990 0.2829 - -
0.9072 4000 0.2907 - -
0.9095 4010 0.2932 - -
0.9118 4020 0.2961 - -
0.9140 4030 0.2925 - -
0.9163 4040 0.2916 - -
0.9186 4050 0.2893 - -
0.9208 4060 0.2908 - -
0.9231 4070 0.2919 - -
0.9254 4080 0.2923 - -
0.9276 4090 0.2827 - -
0.9299 4100 0.2862 - -
0.9322 4110 0.2925 - -
0.9345 4120 0.2913 - -
0.9367 4130 0.2866 - -
0.9390 4140 0.2914 - -
0.9413 4150 0.2825 - -
0.9435 4160 0.2991 - -
0.9458 4170 0.2881 - -
0.9481 4180 0.2853 - -
0.9503 4190 0.2872 - -
0.9526 4200 0.2900 - -
0.9549 4210 0.2937 - -
0.9571 4220 0.2852 - -
0.9594 4230 0.2889 - -
0.9617 4240 0.2873 - -
0.9639 4250 0.2918 - -
0.9662 4260 0.2880 - -
0.9685 4270 0.2881 - -
0.9707 4280 0.2915 - -
0.9730 4290 0.2873 - -
0.9753 4300 0.2897 - -
0.9775 4310 0.2828 - -
0.9798 4320 0.2877 - -
0.9821 4330 0.2869 - -
0.9844 4340 0.2883 - -
0.9866 4350 0.2953 - -
0.9889 4360 0.2911 - -
0.9912 4370 0.2861 - -
0.9934 4380 0.2954 - -
0.9957 4390 0.2939 - -
0.9980 4400 0.2890 - -
1.0 4409 - 0.0825 0.9419
-1 -1 - - 0.9419

Training Time

  • Training: 5.0 hours
  • Evaluation: 10.6 minutes
  • Total: 5.1 hours

Framework Versions

  • Python: 3.12.13
  • Sentence Transformers: 5.4.1
  • Transformers: 5.8.0
  • PyTorch: 2.11.0+cu130
  • Accelerate: 1.13.0
  • Datasets: 4.8.5
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including TeraflopAI/teraflopai-denseon-caselaw

Papers for TeraflopAI/teraflopai-denseon-caselaw

Evaluation results

  • Cosine Accuracy@1 on DenseOn lr6e 05 warmup0.1 bs8k caselaw
    self-reported
    0.893
  • Cosine Accuracy@3 on DenseOn lr6e 05 warmup0.1 bs8k caselaw
    self-reported
    0.957
  • Cosine Accuracy@5 on DenseOn lr6e 05 warmup0.1 bs8k caselaw
    self-reported
    0.972
  • Cosine Accuracy@10 on DenseOn lr6e 05 warmup0.1 bs8k caselaw
    self-reported
    0.985
  • Cosine Precision@1 on DenseOn lr6e 05 warmup0.1 bs8k caselaw
    self-reported
    0.893
  • Cosine Precision@3 on DenseOn lr6e 05 warmup0.1 bs8k caselaw
    self-reported
    0.319
  • Cosine Precision@5 on DenseOn lr6e 05 warmup0.1 bs8k caselaw
    self-reported
    0.194
  • Cosine Precision@10 on DenseOn lr6e 05 warmup0.1 bs8k caselaw
    self-reported
    0.099