calculator_model_test

This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0826

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 512
  • eval_batch_size: 512
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 100
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
3.2338 1.0 9 2.3613
1.9485 2.0 18 1.5220
1.3333 3.0 27 1.1768
1.0688 4.0 36 0.9485
0.9111 5.0 45 0.8320
0.8187 6.0 54 0.7387
0.7141 7.0 63 0.6621
0.6359 8.0 72 0.5770
0.5749 9.0 81 0.5312
0.5475 10.0 90 0.4928
0.5034 11.0 99 0.4602
0.4597 12.0 108 0.4101
0.4166 13.0 117 0.3805
0.3871 14.0 126 0.3521
0.3576 15.0 135 0.3196
0.3265 16.0 144 0.2917
0.2991 17.0 153 0.2617
0.2761 18.0 162 0.2420
0.2511 19.0 171 0.2203
0.2274 20.0 180 0.2161
0.2190 21.0 189 0.2181
0.2122 22.0 198 0.2023
0.1999 23.0 207 0.1845
0.1812 24.0 216 0.1789
0.1778 25.0 225 0.1648
0.1650 26.0 234 0.1537
0.1475 27.0 243 0.1457
0.1415 28.0 252 0.1407
0.1303 29.0 261 0.1361
0.1225 30.0 270 0.1319
0.1191 31.0 279 0.1264
0.1154 32.0 288 0.1231
0.1117 33.0 297 0.1197
0.1063 34.0 306 0.1172
0.0966 35.0 315 0.1190
0.0949 36.0 324 0.1121
0.0889 37.0 333 0.1081
0.0829 38.0 342 0.1096
0.0833 39.0 351 0.1102
0.0778 40.0 360 0.1014
0.0710 41.0 369 0.1024
0.0690 42.0 378 0.1019
0.0676 43.0 387 0.1013
0.0633 44.0 396 0.0980
0.0615 45.0 405 0.1016
0.0583 46.0 414 0.0944
0.0532 47.0 423 0.0941
0.0539 48.0 432 0.0946
0.0513 49.0 441 0.0911
0.0474 50.0 450 0.0912
0.0459 51.0 459 0.0907
0.0442 52.0 468 0.0899
0.0410 53.0 477 0.0935
0.0368 54.0 486 0.0898
0.0356 55.0 495 0.0887
0.0344 56.0 504 0.0896
0.0318 57.0 513 0.0894
0.0307 58.0 522 0.0884
0.0272 59.0 531 0.0889
0.0261 60.0 540 0.0857
0.0246 61.0 549 0.0834
0.0237 62.0 558 0.0875
0.0223 63.0 567 0.0865
0.0229 64.0 576 0.0864
0.0213 65.0 585 0.0884
0.0213 66.0 594 0.0848
0.0208 67.0 603 0.0848
0.0192 68.0 612 0.0845
0.0185 69.0 621 0.0868
0.0180 70.0 630 0.0844
0.0165 71.0 639 0.0843
0.0160 72.0 648 0.0843
0.0151 73.0 657 0.0862
0.0134 74.0 666 0.0832
0.0141 75.0 675 0.0840
0.0138 76.0 684 0.0857
0.0134 77.0 693 0.0840
0.0131 78.0 702 0.0853
0.0133 79.0 711 0.0858
0.0123 80.0 720 0.0844
0.0118 81.0 729 0.0842
0.0117 82.0 738 0.0845
0.0103 83.0 747 0.0845
0.0112 84.0 756 0.0834
0.0104 85.0 765 0.0831
0.0101 86.0 774 0.0833
0.0100 87.0 783 0.0823
0.0095 88.0 792 0.0837
0.0098 89.0 801 0.0823
0.0091 90.0 810 0.0843
0.0088 91.0 819 0.0834
0.0090 92.0 828 0.0840
0.0089 93.0 837 0.0837
0.0082 94.0 846 0.0839
0.0085 95.0 855 0.0836
0.0079 96.0 864 0.0835
0.0079 97.0 873 0.0832
0.0080 98.0 882 0.0829
0.0083 99.0 891 0.0826
0.0076 100.0 900 0.0826

Framework versions

  • Transformers 5.0.0
  • Pytorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
115
Safetensors
Model size
7.82M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support