calculator_model_test_2

This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0060

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 512
  • eval_batch_size: 512
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 100
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
3.0266 1.0 6 2.2481
2.0347 2.0 12 1.7403
1.5575 3.0 18 1.3098
1.2110 4.0 24 1.0916
1.0460 5.0 30 0.9786
0.9417 6.0 36 0.8630
0.8312 7.0 42 0.7454
0.7230 8.0 48 0.6662
0.6606 9.0 54 0.6547
0.6435 10.0 60 0.6063
0.5876 11.0 66 0.5376
0.5490 12.0 72 0.5449
0.5356 13.0 78 0.4940
0.5389 14.0 84 0.5342
0.5251 15.0 90 0.4613
0.4776 16.0 96 0.4339
0.4445 17.0 102 0.4073
0.4183 18.0 108 0.4256
0.4132 19.0 114 0.3678
0.3785 20.0 120 0.3386
0.3393 21.0 126 0.3191
0.3229 22.0 132 0.2889
0.2948 23.0 138 0.2484
0.2685 24.0 144 0.2395
0.2560 25.0 150 0.2348
0.2317 26.0 156 0.2351
0.2329 27.0 162 0.2155
0.2190 28.0 168 0.1862
0.1964 29.0 174 0.1616
0.1796 30.0 180 0.1456
0.1560 31.0 186 0.1129
0.1371 32.0 192 0.0992
0.1370 33.0 198 0.0862
0.1176 34.0 204 0.0849
0.1088 35.0 210 0.0951
0.1017 36.0 216 0.0656
0.0873 37.0 222 0.0515
0.0777 38.0 228 0.0761
0.0915 39.0 234 0.0521
0.0772 40.0 240 0.0500
0.0705 41.0 246 0.0437
0.0662 42.0 252 0.0470
0.0629 43.0 258 0.0441
0.0611 44.0 264 0.0364
0.0587 45.0 270 0.0356
0.0523 46.0 276 0.0315
0.0464 47.0 282 0.0274
0.0467 48.0 288 0.0289
0.0444 49.0 294 0.0323
0.0462 50.0 300 0.0259
0.0381 51.0 306 0.0234
0.0390 52.0 312 0.0251
0.0373 53.0 318 0.0272
0.0381 54.0 324 0.0223
0.0358 55.0 330 0.0287
0.0412 56.0 336 0.0252
0.0391 57.0 342 0.0242
0.0391 58.0 348 0.0207
0.0354 59.0 354 0.0223
0.0308 60.0 360 0.0190
0.0290 61.0 366 0.0158
0.0254 62.0 372 0.0139
0.0231 63.0 378 0.0153
0.0226 64.0 384 0.0135
0.0240 65.0 390 0.0133
0.0224 66.0 396 0.0147
0.0196 67.0 402 0.0111
0.0180 68.0 408 0.0119
0.0169 69.0 414 0.0119
0.0181 70.0 420 0.0109
0.0161 71.0 426 0.0102
0.0151 72.0 432 0.0100
0.0139 73.0 438 0.0105
0.0169 74.0 444 0.0088
0.0124 75.0 450 0.0082
0.0122 76.0 456 0.0083
0.0125 77.0 462 0.0080
0.0112 78.0 468 0.0084
0.0109 79.0 474 0.0079
0.0103 80.0 480 0.0076
0.0103 81.0 486 0.0072
0.0101 82.0 492 0.0069
0.0091 83.0 498 0.0068
0.0114 84.0 504 0.0068
0.0108 85.0 510 0.0070
0.0101 86.0 516 0.0066
0.0132 87.0 522 0.0067
0.0092 88.0 528 0.0070
0.0103 89.0 534 0.0066
0.0101 90.0 540 0.0064
0.0088 91.0 546 0.0062
0.0086 92.0 552 0.0062
0.0085 93.0 558 0.0062
0.0083 94.0 564 0.0062
0.0086 95.0 570 0.0061
0.0089 96.0 576 0.0061
0.0091 97.0 582 0.0060
0.0096 98.0 588 0.0060
0.0103 99.0 594 0.0060
0.0101 100.0 600 0.0060

Framework versions

  • Transformers 5.0.0
  • Pytorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
70
Safetensors
Model size
7.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support