calculator_model_test_2

This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.0060

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 512
eval_batch_size: 512
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 100
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
3.0266	1.0	6	2.2481
2.0347	2.0	12	1.7403
1.5575	3.0	18	1.3098
1.2110	4.0	24	1.0916
1.0460	5.0	30	0.9786
0.9417	6.0	36	0.8630
0.8312	7.0	42	0.7454
0.7230	8.0	48	0.6662
0.6606	9.0	54	0.6547
0.6435	10.0	60	0.6063
0.5876	11.0	66	0.5376
0.5490	12.0	72	0.5449
0.5356	13.0	78	0.4940
0.5389	14.0	84	0.5342
0.5251	15.0	90	0.4613
0.4776	16.0	96	0.4339
0.4445	17.0	102	0.4073
0.4183	18.0	108	0.4256
0.4132	19.0	114	0.3678
0.3785	20.0	120	0.3386
0.3393	21.0	126	0.3191
0.3229	22.0	132	0.2889
0.2948	23.0	138	0.2484
0.2685	24.0	144	0.2395
0.2560	25.0	150	0.2348
0.2317	26.0	156	0.2351
0.2329	27.0	162	0.2155
0.2190	28.0	168	0.1862
0.1964	29.0	174	0.1616
0.1796	30.0	180	0.1456
0.1560	31.0	186	0.1129
0.1371	32.0	192	0.0992
0.1370	33.0	198	0.0862
0.1176	34.0	204	0.0849
0.1088	35.0	210	0.0951
0.1017	36.0	216	0.0656
0.0873	37.0	222	0.0515
0.0777	38.0	228	0.0761
0.0915	39.0	234	0.0521
0.0772	40.0	240	0.0500
0.0705	41.0	246	0.0437
0.0662	42.0	252	0.0470
0.0629	43.0	258	0.0441
0.0611	44.0	264	0.0364
0.0587	45.0	270	0.0356
0.0523	46.0	276	0.0315
0.0464	47.0	282	0.0274
0.0467	48.0	288	0.0289
0.0444	49.0	294	0.0323
0.0462	50.0	300	0.0259
0.0381	51.0	306	0.0234
0.0390	52.0	312	0.0251
0.0373	53.0	318	0.0272
0.0381	54.0	324	0.0223
0.0358	55.0	330	0.0287
0.0412	56.0	336	0.0252
0.0391	57.0	342	0.0242
0.0391	58.0	348	0.0207
0.0354	59.0	354	0.0223
0.0308	60.0	360	0.0190
0.0290	61.0	366	0.0158
0.0254	62.0	372	0.0139
0.0231	63.0	378	0.0153
0.0226	64.0	384	0.0135
0.0240	65.0	390	0.0133
0.0224	66.0	396	0.0147
0.0196	67.0	402	0.0111
0.0180	68.0	408	0.0119
0.0169	69.0	414	0.0119
0.0181	70.0	420	0.0109
0.0161	71.0	426	0.0102
0.0151	72.0	432	0.0100
0.0139	73.0	438	0.0105
0.0169	74.0	444	0.0088
0.0124	75.0	450	0.0082
0.0122	76.0	456	0.0083
0.0125	77.0	462	0.0080
0.0112	78.0	468	0.0084
0.0109	79.0	474	0.0079
0.0103	80.0	480	0.0076
0.0103	81.0	486	0.0072
0.0101	82.0	492	0.0069
0.0091	83.0	498	0.0068
0.0114	84.0	504	0.0068
0.0108	85.0	510	0.0070
0.0101	86.0	516	0.0066
0.0132	87.0	522	0.0067
0.0092	88.0	528	0.0070
0.0103	89.0	534	0.0066
0.0101	90.0	540	0.0064
0.0088	91.0	546	0.0062
0.0086	92.0	552	0.0062
0.0085	93.0	558	0.0062
0.0083	94.0	564	0.0062
0.0086	95.0	570	0.0061
0.0089	96.0	576	0.0061
0.0091	97.0	582	0.0060
0.0096	98.0	588	0.0060
0.0103	99.0	594	0.0060
0.0101	100.0	600	0.0060

Framework versions

Transformers 5.0.0
Pytorch 2.10.0+cu128
Datasets 4.0.0
Tokenizers 0.22.2

Downloads last month: 70

Safetensors

Model size

7.8M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support