Instructions to use TeraflopAI/teraflopai-denseon-caselaw with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use TeraflopAI/teraflopai-denseon-caselaw with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("TeraflopAI/teraflopai-denseon-caselaw") sentences = [ "Under what specific evidentiary standard is evidence of a third party's motive or opportunity admissible to create reasonable doubt about a defendant's guilt?", "In Walker v. State,\n353 Ark. 12, 17\n,\n110 S.W.3d 752, 755\n (2003) we explained our\n\nholding in Zinger. \n\n We have held that a defendant may introduce evidence tending to show that\n someone other than the defendant committed the crime charged, but such evidence\n is inadmissible unless it points directly to the guilt of the third party. Evidence\n which does no more than create an inference or conjecture as to another's guilt is\n inadmissible. [Burmingham v. State,\n342 Ark. 95\n,\n27 S.W.3d 351\n (2000)]; Zinger v.\n State,\n313 Ark. 70\n,\n852 S.W.2d 320\n (1993) (citing State v. Wilson,\n322 N.C. 117\n,\n367 S.E.2d 589\n (1988)). This rule does not require that any evidence, however remote,\n must be admitted to show a third party's possible culpability; evidence of mere\n\n\n 24\n\f motive or opportunity to commit the crime in another person, without more, will\n not suffice to raise a reasonable doubt about a defendant's guilt. There must be\n direct or circumstantial evidence linking the third person to the actual perpetration\n of the crime.", "This court commented further on the doctrine of informed consent in Williams v. Menehan, 191 Kan. 6, 379 P. 2d 292. There it was held the parents of a small child who died while a team of physicians was performing a cardiac catheterization had given an informed consent to the procedure. In commenting on the rule as laid down iii Natanson, supra, it was said:\n\n\". . . [I]t is the duty of a doctor to make a reasonable disclosure to his patient of the nature and probable consequences of the suggested or recommended treatment, and to make a reasonable disclosure of the dangers within his knowledge which are incident or possible in the treatment he proposes to administer. But this does not mean that a doctor is under an obligation to describe in detail all of the possible consequences of treatment. To make a *532 complete disclosure of all facts, diagnoses and alternatives or possibilities which might occur to the doctor could so alarm the patient that it would, in fact, constitute bad medical practice.\" (p. 8.)", "Commonwealth v. Melvin,\n103 A.3d 1, 40\n (Pa. Super. 2014). \n\n The trial court accurately summarized the facts presented at trial,\n\nviewed in the light most favorable to the Commonwealth as verdict winner:\n\n On July 18, 2013, Susan Riffle was a passenger\n on a motorcycle being operated by [Mangone]. The\n motorcycle hit some loose gravel and went down and\n Riffle, who sustained numerous injuries as a result of\n the accident, was \"Life Flighted.\"[] When Riffle saw\n [Mangone] leave the scene of the accident, her belief\n was that he was going for help because she told him\n that help was needed. \n\n Andrew Franko, a first responder, responded to\n the scene and observed a female, who was not in\n good condition, lying on the roadway. Observing\n [Mangone] going towards his motorcycle, Franko\n said to him that \"she is hurt. You can't go nowhere.\"\n Franko also advised [Mangone] that he was a first\n responder and could provide help. Nonetheless,\n [Mangone] picked up his motorcycle and left." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
SentenceTransformer
This is a sentence-transformers model trained on the parquet dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for retrieval.
Model Details
Model Description
- Model Type: Sentence Transformer
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Supported Modality: Text
- Training Dataset:
- parquet
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'ModernBertModel'})
(1): Pooling({'embedding_dimension': 768, 'pooling_mode': 'cls', 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("TeraflopAI/teraflopai-denseon-caselaw")
# Run inference
queries = [
"Under what circumstances can a plaintiff recover damages for lost profits when a defendant's breach of contract involves the failure to provide telephone service?",
]
documents = [
'UNITED STATES COURT OF APPEALS\n UNITED STATES COURT OF APPEALS\n FOR THE FIRST CIRCUIT\n FOR THE FIRST CIRCUIT\n \nNo. 94-1711\n\n SAS OF PUERTO RICO, INC.,\n\n Plaintiff, Appellant,\n\n v.\n\n PUERTO RICO TELEPHONE COMPANY,\n\n Defendant, Appellee. \n\n \n\n APPEAL FROM THE UNITED STATES DISTRICT COURT\n\n FOR THE DISTRICT OF PUERTO RICO\n\n [Hon. Jose Antonio Fuste, U.S. District Judge]\n \n\n \n\n Before\n\n Torruella, Chief Judge,\n \n\n Boudin, Circuit Judge,\n \n\n and Boyle,* Senior District Judge.\n \n\n \n\nLaurence Z. Shiekman with whom M. Duncan Grant, Frank M.\n \nRapoport, Michael A. Ceramella and Pepper, Hamilton & Scheetz were on\n \nbrief for appellant.\nPhilip J. Mause with whom Joaquin A. Marquez and Drinker Biddle &\n \nReath were on brief for appellee.\n \n\n \n\n February 21, 1995\n \n\n \n\n*Of the District of Rhode Island, sitting by designation.',
'Nor did the BIA abuse its discretion by denying Chen\'s motion to reopen, which alleged that he suffered from the ineffective assistance of counsel. To prevail on such a claim, the alien must first comply with certain procedures set forth in Matter of Lozada, 19 I. & N. Dec. 637 (BIA 1988). Here, the BIA properly noted that besides filing a supporting affidavit, Chen made no effort to comply with the requirements enumerated in Lozada. Chen not only failed to notify his former counsel of the allegations of ineffective assistance and to allow him an opportunity to respond, he also failed to file a complaint with a disciplinary authority or provide an explanation for not doing so. See Twum v. INS, 411 F.3d 54, 59 (2d Cir.2005) (citing Lozada, 19 I. & N. Dec. at 639). \n\nBy failing to substantially comply with Lozada, Chen "forfeit[ed][his] ineffective assistance of counsel claim." Jian Yun Zheng v. U.S. Dep\'t of Justice, 409 F.3d 43, 47 (2d Cir.2005). While it is true that "slavish adherence" to Lozada\'s requirements is not necessary in certain circumstances, and while the BIA acknowledged in its June 2006 decision that the brief written by Chen\'s former counsel was deficient, this is not a case in which the facts supporting a "claim of ineffective assistance are clear on the face of the record," which may excuse the failure to comply with Lozada. Yi Long Yang, 478 F.3d at 142-43. The facts here are distinct from the circumstances presented in Yi Long Yang, in that Chen\'s former counsel was not disbarred, nor was there evidence that the agency explicitly assumed his competence. See id. at 142.',
'We believe the same prerequisite should operate in this\ncase. The requirement that parties seeking Rule 60(b) relief\nshow some prospect of succeeding on the merits flows from\nthe basic principle that courts should revive previously-\ndismissed claims only if they have some reason to believe that\ndoing so will not ultimately waste judicial resources. See\nMurray,\n52 F.3d at 355\n. This principle holds true here:\nreviving Thomas\'s appeal will constitute an "empty exercise\nor futile gesture,"\nid.,\n unless Thomas has some possibility of\nprevailing. \n\n Indeed, we see two especially good reasons to condition\nthe grant of Thomas\'s motion for reconsideration on his\ndemonstrating a chance of succeeding on the merits. First,\nThomas claims that his appeal should be reinstated because\nthe PLRA\'s three-strikes provision is unconstitutional as\napplied to him. For this court to reach out and decide this\ndifficult and important question simply to reinstate a pointless\nappeal would violate the norm of constitutional avoidance to\nwhich we generally adhere. See Kalka v. Hawk,\n215 F.3d 90, 97\n (D.C. Cir. 2000) ("Federal courts should not decide\nconstitutional questions unless it is necessary to do so.").\nSecond, the PLRA provides that a court "shall dismiss" an\nIFP litigant\'s case if the "appeal . . . is frivolous or malicious\n. . . [or] fails to state a claim on which relief may be granted."\n28 U.S.C. § 1915\n(e)(2). Thus, even were we to grant Thomas\nIFP status and reinstate his appeal, we would then have to\n\x0c 7\npromptly dismiss the case if his claims lack merit. What could\nbe a more "futile gesture" than reinstating an appeal only to\nthen immediately dismiss it?',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.6819, -0.1021, -0.0287]])
Evaluation
Metrics
Information Retrieval
- Dataset:
DenseOn_lr6e-05_warmup0.1_bs8k-caselaw - Evaluated with
InformationRetrievalEvaluator
| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.8934 |
| cosine_accuracy@3 | 0.957 |
| cosine_accuracy@5 | 0.9722 |
| cosine_accuracy@10 | 0.9854 |
| cosine_precision@1 | 0.8934 |
| cosine_precision@3 | 0.319 |
| cosine_precision@5 | 0.1944 |
| cosine_precision@10 | 0.0985 |
| cosine_recall@1 | 0.8934 |
| cosine_recall@3 | 0.957 |
| cosine_recall@5 | 0.9722 |
| cosine_recall@10 | 0.9854 |
| cosine_ndcg@10 | 0.9419 |
| cosine_mrr@10 | 0.9277 |
| cosine_map@100 | 0.9284 |
Training Details
Training Dataset
parquet
Dataset: parquet
Size: 36,118,859 training samples
Columns:
questionandanswerApproximate statistics based on the first 1000 samples:
question answer type string string details - min: 16 tokens
- mean: 29.64 tokens
- max: 52 tokens
- min: 41 tokens
- mean: 306.99 tokens
- max: 512 tokens
Samples:
question answer What is the legal standard and procedure for granting a defendant's request for leave to withdraw as counsel when the attorney certifies that no nonfrivolous issues exist for appeal?Appeal by the defendant from two judgments of the Supreme Court, Queens County (Rosengarten, J.), both rendered May 5, 2003, convicting him of burglary in the first degree, robbery in the first degree, and burglary in the second degree under indictment No. 3417/01, and burglary in the first degree, robbery in the first degree, and burglary in the second degree under indictment No. 1182/02, upon his pleas of guilty, and imposing sentences.
Ordered that the judgments are affirmed.
We have reviewed the record and agree with the defendant's assigned counsel that there are no nonfrivolous issues which could be raised on appeal. Counsel's application for leave to withdraw as counsel is granted (see Anders v California, 386 US 738 [1967]; People v Paige, 54 AD2d 631 [1976]; cf. People v *606Gonzalez, 47 NY2d 606 [1979]). Adams, J.P., Cozier, Ritter and Skelos, JJ., concur.Are state-law tort claims alleging defective labeling of generic drugs preempted by federal law?ORDERJOSEPH N. LAPLANTE, District Judge.
This case presents a question currently pending before three different federal courts of appeal: whether state-law tort claims alleging the defective labeling of generic drugs are preempted by federal law. See Morris v. Wyeth, Inc., No. 09-5509 (6th Cir. Apr. 27, 2009); Demahy v. Wyeth, Inc., No. 08-31204 (5th Cir. Dec. 16, 2008); Mensing v. Wyeth, Inc., No. 08-3850 (8th Cir. Dec. 10, 2008). The defendants, Mutual Pharmaceutical Company, Inc. and United Research Laboratories, Inc., move for judgment on the pleadings, see Fed.R.Civ.P. 12(c), on claims by the plaintiffs, Karen L. and Gregory S. Bartlett, alleging that Karen suffered serious injuries from Sulindac, a generic drug manufactured by the defendants. The defendants argue that all of the plaintiffs' state-law causes of action are pre-empted by Title I of the Drug Price Competition and Patent Term Restoration Act of 1984, 1 part of the Hatch-Waxman Amendments to the Federal Food, Drug,... | |
Under what circumstances can a trial court's finding of competency to stand trial be challenged when multiple expert evaluations conclude the defendant is competent but exhibit bizarre conduct?|Prior to trial, Card was examined by two court-appointed psychologists for the purpose of determining whether he was competent to stand trial. Following examinations, both psychologists concluded that Card was competent to stand trial pursuant to the criteria set forth in Rule 3.211, Florida Rule of Criminal Procedure. After the initial reports of the two court-appointed *1175 experts were filed, the defense filed a motion for the appointment of a forensic psychiatrist to examine Card. The court acquiesced to this request. Although the forensic psychiatrist did not file his report with the court until a few months after the court issued its order finding Card competent to stand trial, the forensic psychiatrist also concluded that Card was competent. Further, although the various reports filed by the experts indicate bizarre conduct and behavioral problems, the trial court was never presented with evidence providing reasonable grounds to believe that Card was not competent to stand tria...|Loss:
MatryoshkaLosswith these parameters:{ "loss": "CachedMultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128 ], "matryoshka_weights": [ 1, 1, 1, 1 ], "n_dims_per_step": -1 }
Evaluation Dataset
parquet
- Dataset: parquet
- Size: 10,000 evaluation samples
- Columns:
questionandanswer - Approximate statistics based on the first 1000 samples:
question answer type string string details - min: 15 tokens
- mean: 29.87 tokens
- max: 55 tokens
- min: 74 tokens
- mean: 311.66 tokens
- max: 512 tokens
- Samples:The Seventh Circuit recently observed that "[i]n order to show `deliberate indifference,' a plaintiff is required to prove that the prison official's action was deliberate or reckless in the criminal sense." Santiago v. Lane, 894 F.2d 218 (7th Cir.1990) (emphasis added) (footnote omitted). The United States Supreme Court has cited the Seventh Circuit's criminal recklessness standard with approval. Whitley v. Albers, 475 U.S. 312, 321, 106 S.Ct. 1078, 1085, 89 L.Ed.2d 251 (1986), citing Duckworth v. Franzen, 780 F.2d 645, 653 (7th Cir.1985), cert. denied, 479 U.S. 816, 107 S.Ct. 71, 93 L.Ed.2d 28 (1986). In Franzen, the Seventh Circuit noted that punishment under the Eighth Amendment "implies at a minimum actual knowledge of impending harm easily preventable, so that a conscious, culpable refusal to prevent the harm can be inferred from the defendant's failure to prevent it." 780 F.2d at 653. See also Wilks v. You... |
question answer What specific factors do Texas courts consider when determining if terminating a parent's rights serves the child's best interest?In determining whether termination is in the child's best interest, we apply the following factors laid out in Holley v. Adams, 544 S.W.2d 367, 371–72 (Tex. 1976). Those factors include, but are not limited to:
1. The child's desires;
2. The child's physical and emotional needs, now and in the future;
3. The emotional and physical danger to the child, now and in the future;
4. The parental ability of the individuals seeking custody;
5. The programs available to assist these individuals in promoting the child's best interest;
6. The plans for the child by the individual or agency seeking custody;
7. The stability of the home or proposed placement;
8. The parent's act or omissions that may indicate the existing parent-child relationship is not the proper one; and
9. Any excuse for the parent's acts or omissions.Under what circumstances do separate criminal acts fail to constitute a single continuous transaction for the purpose of admitting evidence of one act to prove another?¶37 The case before us is far more analogous to Hildreth than to the others. Gallegos stabbed Victim in a park and was later apprehended. Then, while at the police station, Gallegos acted violently, resulting in additional charges. Gallegos's violent behavior at the police station did not "facilitate[ ] flight" from the earlier attack, nor could the later crimes be characterized as "a single [violent] spree," as we would characterize a string of robberies, for example. See Benson , 2014 UT App 92 , ¶¶ 13-14, 325 P.3d 855 . Neither do Gallegos's crimes demonstrate "a distinct behavioral arc of increasingly aggressive and opportunistic transgressions." Burke , 2011 UT App 168 , ¶ 24, 256 P.3d 1102 . Instead, this case is more like Hildreth , where the defendant committed a sequence of offenses, but those offenses were not otherwise related to each other. See 2010 UT App 209 , ¶ 32, 238 P.3d 444 . Here, the stabbing at the park and the violent behavior at the police station are so indepen...What level of culpability, such as actual knowledge or reckless disregard, must a plaintiff prove to establish an Eighth Amendment violation for deliberate indifference?Wilson v. Seiter, ___ U.S. at -, ___, 111 S.Ct. at 2324-25, 2327. - Loss:
MatryoshkaLosswith these parameters:{ "loss": "CachedMultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128 ], "matryoshka_weights": [ 1, 1, 1, 1 ], "n_dims_per_step": -1 }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size: 2048num_train_epochs: 1learning_rate: 6e-05lr_scheduler_type: cosinewarmup_steps: 0.1bf16: Trueper_device_eval_batch_size: 512prompts: {'question': 'query: ', 'answer': 'document: '}batch_sampler: no_duplicates
All Hyperparameters
Click to expand
per_device_train_batch_size: 2048num_train_epochs: 1max_steps: -1learning_rate: 6e-05lr_scheduler_type: cosinelr_scheduler_kwargs: Nonewarmup_steps: 0.1optim: adamw_torch_fusedoptim_args: Noneweight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08optim_target_modules: Nonegradient_accumulation_steps: 1average_tokens_across_devices: Truemax_grad_norm: 1.0label_smoothing_factor: 0.0bf16: Truefp16: Falsebf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Nonetorch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneuse_liger_kernel: Falseliger_kernel_config: Noneuse_cache: Falseneftune_noise_alpha: Nonetorch_empty_cache_steps: Noneauto_find_batch_size: Falselog_on_each_node: Truelogging_nan_inf_filter: Trueinclude_num_input_tokens_seen: nolog_level: passivelog_level_replica: warningdisable_tqdm: Falseproject: huggingfacetrackio_space_id: Nonetrackio_bucket_id: Nonetrackio_static_space_id: Noneper_device_eval_batch_size: 512prediction_loss_only: Trueeval_on_start: Falseeval_do_concat_batches: Trueeval_use_gather_object: Falseeval_accumulation_steps: Noneinclude_for_metrics: []batch_eval_metrics: Falsesave_only_model: Falsesave_on_each_node: Falseenable_jit_checkpoint: Falsepush_to_hub: Falsehub_private_repo: Nonehub_model_id: Nonehub_strategy: every_savehub_always_push: Falsehub_revision: Noneload_best_model_at_end: Falseignore_data_skip: Falserestore_callback_states_from_checkpoint: Falsefull_determinism: Falseseed: 42data_seed: Noneuse_cpu: Falseaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedataloader_drop_last: Truedataloader_num_workers: 0dataloader_pin_memory: Truedataloader_persistent_workers: Falsedataloader_prefetch_factor: Noneremove_unused_columns: Truelabel_names: Nonetrain_sampling_strategy: randomlength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falseddp_static_graph: Noneddp_backend: Noneddp_timeout: 1800fsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}deepspeed: Nonedebug: []skip_memory_metrics: Truedo_predict: Falseresume_from_checkpoint: Nonewarmup_ratio: Nonelocal_rank: -1prompts: {'question': 'query: ', 'answer': 'document: '}batch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}
Training Logs
Click to expand
| Epoch | Step | Training Loss | Validation Loss | DenseOn_lr6e-05_warmup0.1_bs8k-caselaw_cosine_ndcg@10 |
|---|---|---|---|---|
| 0.7008 | 3090 | 0.3007 | - | - |
| 0.7031 | 3100 | 0.2991 | - | - |
| 0.7054 | 3110 | 0.3052 | - | - |
| 0.7076 | 3120 | 0.2941 | - | - |
| 0.7099 | 3130 | 0.2964 | - | - |
| 0.7122 | 3140 | 0.2958 | - | - |
| 0.7144 | 3150 | 0.2966 | - | - |
| 0.7167 | 3160 | 0.2959 | - | - |
| 0.7190 | 3170 | 0.2954 | - | - |
| 0.7213 | 3180 | 0.2982 | - | - |
| 0.7235 | 3190 | 0.2980 | - | - |
| 0.7258 | 3200 | 0.2955 | - | - |
| 0.7281 | 3210 | 0.2958 | - | - |
| 0.7303 | 3220 | 0.2988 | - | - |
| 0.7326 | 3230 | 0.2892 | - | - |
| 0.7349 | 3240 | 0.2973 | - | - |
| 0.7371 | 3250 | 0.2960 | - | - |
| 0.7394 | 3260 | 0.3030 | - | - |
| 0.7417 | 3270 | 0.2998 | - | - |
| 0.7439 | 3280 | 0.3021 | - | - |
| 0.7462 | 3290 | 0.3046 | - | - |
| 0.7485 | 3300 | 0.2922 | - | - |
| 0.7507 | 3310 | 0.2941 | - | - |
| 0.7530 | 3320 | 0.2953 | - | - |
| 0.7553 | 3330 | 0.2992 | - | - |
| 0.7575 | 3340 | 0.3015 | - | - |
| 0.7598 | 3350 | 0.2951 | - | - |
| 0.7621 | 3360 | 0.3021 | - | - |
| 0.7643 | 3370 | 0.3022 | - | - |
| 0.7666 | 3380 | 0.2923 | - | - |
| 0.7689 | 3390 | 0.2946 | - | - |
| 0.7711 | 3400 | 0.2986 | - | - |
| 0.7734 | 3410 | 0.2960 | - | - |
| 0.7757 | 3420 | 0.3006 | - | - |
| 0.7780 | 3430 | 0.3020 | - | - |
| 0.7802 | 3440 | 0.2894 | - | - |
| 0.7825 | 3450 | 0.2986 | - | - |
| 0.7848 | 3460 | 0.2912 | - | - |
| 0.7870 | 3470 | 0.2957 | - | - |
| 0.7893 | 3480 | 0.2954 | - | - |
| 0.7916 | 3490 | 0.2937 | - | - |
| 0.7938 | 3500 | 0.2989 | - | - |
| 0.7961 | 3510 | 0.2956 | - | - |
| 0.7984 | 3520 | 0.3020 | - | - |
| 0.8006 | 3530 | 0.2957 | - | - |
| 0.8029 | 3540 | 0.2873 | - | - |
| 0.8052 | 3550 | 0.2900 | - | - |
| 0.8074 | 3560 | 0.2885 | - | - |
| 0.8097 | 3570 | 0.2904 | - | - |
| 0.8120 | 3580 | 0.2857 | - | - |
| 0.8142 | 3590 | 0.2977 | - | - |
| 0.8165 | 3600 | 0.2891 | - | - |
| 0.8188 | 3610 | 0.2958 | - | - |
| 0.8210 | 3620 | 0.2985 | - | - |
| 0.8233 | 3630 | 0.2915 | - | - |
| 0.8256 | 3640 | 0.2910 | - | - |
| 0.8279 | 3650 | 0.2931 | - | - |
| 0.8301 | 3660 | 0.2983 | - | - |
| 0.8324 | 3670 | 0.2921 | - | - |
| 0.8347 | 3680 | 0.2804 | - | - |
| 0.8369 | 3690 | 0.3018 | - | - |
| 0.8392 | 3700 | 0.2920 | - | - |
| 0.8415 | 3710 | 0.2897 | - | - |
| 0.8437 | 3720 | 0.2896 | - | - |
| 0.8460 | 3730 | 0.2884 | - | - |
| 0.8483 | 3740 | 0.2919 | - | - |
| 0.8505 | 3750 | 0.2896 | - | - |
| 0.8528 | 3760 | 0.2971 | - | - |
| 0.8551 | 3770 | 0.2948 | - | - |
| 0.8573 | 3780 | 0.2869 | - | - |
| 0.8596 | 3790 | 0.2976 | - | - |
| 0.8619 | 3800 | 0.2924 | - | - |
| 0.8641 | 3810 | 0.2907 | - | - |
| 0.8664 | 3820 | 0.2973 | - | - |
| 0.8687 | 3830 | 0.2985 | - | - |
| 0.8709 | 3840 | 0.2909 | - | - |
| 0.8732 | 3850 | 0.2951 | - | - |
| 0.8755 | 3860 | 0.2851 | - | - |
| 0.8778 | 3870 | 0.2867 | - | - |
| 0.8800 | 3880 | 0.2950 | - | - |
| 0.8823 | 3890 | 0.2919 | - | - |
| 0.8846 | 3900 | 0.2978 | - | - |
| 0.8868 | 3910 | 0.2902 | - | - |
| 0.8891 | 3920 | 0.2953 | - | - |
| 0.8914 | 3930 | 0.2938 | - | - |
| 0.8936 | 3940 | 0.2922 | - | - |
| 0.8959 | 3950 | 0.2884 | - | - |
| 0.8982 | 3960 | 0.2881 | - | - |
| 0.9002 | 3969 | - | 0.0828 | 0.9418 |
| 0.9004 | 3970 | 0.2967 | - | - |
| 0.9027 | 3980 | 0.2941 | - | - |
| 0.9050 | 3990 | 0.2829 | - | - |
| 0.9072 | 4000 | 0.2907 | - | - |
| 0.9095 | 4010 | 0.2932 | - | - |
| 0.9118 | 4020 | 0.2961 | - | - |
| 0.9140 | 4030 | 0.2925 | - | - |
| 0.9163 | 4040 | 0.2916 | - | - |
| 0.9186 | 4050 | 0.2893 | - | - |
| 0.9208 | 4060 | 0.2908 | - | - |
| 0.9231 | 4070 | 0.2919 | - | - |
| 0.9254 | 4080 | 0.2923 | - | - |
| 0.9276 | 4090 | 0.2827 | - | - |
| 0.9299 | 4100 | 0.2862 | - | - |
| 0.9322 | 4110 | 0.2925 | - | - |
| 0.9345 | 4120 | 0.2913 | - | - |
| 0.9367 | 4130 | 0.2866 | - | - |
| 0.9390 | 4140 | 0.2914 | - | - |
| 0.9413 | 4150 | 0.2825 | - | - |
| 0.9435 | 4160 | 0.2991 | - | - |
| 0.9458 | 4170 | 0.2881 | - | - |
| 0.9481 | 4180 | 0.2853 | - | - |
| 0.9503 | 4190 | 0.2872 | - | - |
| 0.9526 | 4200 | 0.2900 | - | - |
| 0.9549 | 4210 | 0.2937 | - | - |
| 0.9571 | 4220 | 0.2852 | - | - |
| 0.9594 | 4230 | 0.2889 | - | - |
| 0.9617 | 4240 | 0.2873 | - | - |
| 0.9639 | 4250 | 0.2918 | - | - |
| 0.9662 | 4260 | 0.2880 | - | - |
| 0.9685 | 4270 | 0.2881 | - | - |
| 0.9707 | 4280 | 0.2915 | - | - |
| 0.9730 | 4290 | 0.2873 | - | - |
| 0.9753 | 4300 | 0.2897 | - | - |
| 0.9775 | 4310 | 0.2828 | - | - |
| 0.9798 | 4320 | 0.2877 | - | - |
| 0.9821 | 4330 | 0.2869 | - | - |
| 0.9844 | 4340 | 0.2883 | - | - |
| 0.9866 | 4350 | 0.2953 | - | - |
| 0.9889 | 4360 | 0.2911 | - | - |
| 0.9912 | 4370 | 0.2861 | - | - |
| 0.9934 | 4380 | 0.2954 | - | - |
| 0.9957 | 4390 | 0.2939 | - | - |
| 0.9980 | 4400 | 0.2890 | - | - |
| 1.0 | 4409 | - | 0.0825 | 0.9419 |
| -1 | -1 | - | - | 0.9419 |
Training Time
- Training: 5.0 hours
- Evaluation: 10.6 minutes
- Total: 5.1 hours
Framework Versions
- Python: 3.12.13
- Sentence Transformers: 5.4.1
- Transformers: 5.8.0
- PyTorch: 2.11.0+cu130
- Accelerate: 1.13.0
- Datasets: 4.8.5
- Tokenizers: 0.22.2
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
CachedMultipleNegativesRankingLoss
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021},
eprint={2101.06983},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
- Downloads last month
- -
Collection including TeraflopAI/teraflopai-denseon-caselaw
Papers for TeraflopAI/teraflopai-denseon-caselaw
Matryoshka Representation Learning
Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Evaluation results
- Cosine Accuracy@1 on DenseOn lr6e 05 warmup0.1 bs8k caselawself-reported0.893
- Cosine Accuracy@3 on DenseOn lr6e 05 warmup0.1 bs8k caselawself-reported0.957
- Cosine Accuracy@5 on DenseOn lr6e 05 warmup0.1 bs8k caselawself-reported0.972
- Cosine Accuracy@10 on DenseOn lr6e 05 warmup0.1 bs8k caselawself-reported0.985
- Cosine Precision@1 on DenseOn lr6e 05 warmup0.1 bs8k caselawself-reported0.893
- Cosine Precision@3 on DenseOn lr6e 05 warmup0.1 bs8k caselawself-reported0.319
- Cosine Precision@5 on DenseOn lr6e 05 warmup0.1 bs8k caselawself-reported0.194
- Cosine Precision@10 on DenseOn lr6e 05 warmup0.1 bs8k caselawself-reported0.099