calculator_model_test

This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.0826

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 512
eval_batch_size: 512
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 100
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
3.2338	1.0	9	2.3613
1.9485	2.0	18	1.5220
1.3333	3.0	27	1.1768
1.0688	4.0	36	0.9485
0.9111	5.0	45	0.8320
0.8187	6.0	54	0.7387
0.7141	7.0	63	0.6621
0.6359	8.0	72	0.5770
0.5749	9.0	81	0.5312
0.5475	10.0	90	0.4928
0.5034	11.0	99	0.4602
0.4597	12.0	108	0.4101
0.4166	13.0	117	0.3805
0.3871	14.0	126	0.3521
0.3576	15.0	135	0.3196
0.3265	16.0	144	0.2917
0.2991	17.0	153	0.2617
0.2761	18.0	162	0.2420
0.2511	19.0	171	0.2203
0.2274	20.0	180	0.2161
0.2190	21.0	189	0.2181
0.2122	22.0	198	0.2023
0.1999	23.0	207	0.1845
0.1812	24.0	216	0.1789
0.1778	25.0	225	0.1648
0.1650	26.0	234	0.1537
0.1475	27.0	243	0.1457
0.1415	28.0	252	0.1407
0.1303	29.0	261	0.1361
0.1225	30.0	270	0.1319
0.1191	31.0	279	0.1264
0.1154	32.0	288	0.1231
0.1117	33.0	297	0.1197
0.1063	34.0	306	0.1172
0.0966	35.0	315	0.1190
0.0949	36.0	324	0.1121
0.0889	37.0	333	0.1081
0.0829	38.0	342	0.1096
0.0833	39.0	351	0.1102
0.0778	40.0	360	0.1014
0.0710	41.0	369	0.1024
0.0690	42.0	378	0.1019
0.0676	43.0	387	0.1013
0.0633	44.0	396	0.0980
0.0615	45.0	405	0.1016
0.0583	46.0	414	0.0944
0.0532	47.0	423	0.0941
0.0539	48.0	432	0.0946
0.0513	49.0	441	0.0911
0.0474	50.0	450	0.0912
0.0459	51.0	459	0.0907
0.0442	52.0	468	0.0899
0.0410	53.0	477	0.0935
0.0368	54.0	486	0.0898
0.0356	55.0	495	0.0887
0.0344	56.0	504	0.0896
0.0318	57.0	513	0.0894
0.0307	58.0	522	0.0884
0.0272	59.0	531	0.0889
0.0261	60.0	540	0.0857
0.0246	61.0	549	0.0834
0.0237	62.0	558	0.0875
0.0223	63.0	567	0.0865
0.0229	64.0	576	0.0864
0.0213	65.0	585	0.0884
0.0213	66.0	594	0.0848
0.0208	67.0	603	0.0848
0.0192	68.0	612	0.0845
0.0185	69.0	621	0.0868
0.0180	70.0	630	0.0844
0.0165	71.0	639	0.0843
0.0160	72.0	648	0.0843
0.0151	73.0	657	0.0862
0.0134	74.0	666	0.0832
0.0141	75.0	675	0.0840
0.0138	76.0	684	0.0857
0.0134	77.0	693	0.0840
0.0131	78.0	702	0.0853
0.0133	79.0	711	0.0858
0.0123	80.0	720	0.0844
0.0118	81.0	729	0.0842
0.0117	82.0	738	0.0845
0.0103	83.0	747	0.0845
0.0112	84.0	756	0.0834
0.0104	85.0	765	0.0831
0.0101	86.0	774	0.0833
0.0100	87.0	783	0.0823
0.0095	88.0	792	0.0837
0.0098	89.0	801	0.0823
0.0091	90.0	810	0.0843
0.0088	91.0	819	0.0834
0.0090	92.0	828	0.0840
0.0089	93.0	837	0.0837
0.0082	94.0	846	0.0839
0.0085	95.0	855	0.0836
0.0079	96.0	864	0.0835
0.0079	97.0	873	0.0832
0.0080	98.0	882	0.0829
0.0083	99.0	891	0.0826
0.0076	100.0	900	0.0826

Framework versions

Transformers 5.0.0
Pytorch 2.10.0+cu128
Datasets 4.0.0
Tokenizers 0.22.2

Downloads last month: 115

Safetensors

Model size

7.82M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support