gpt-125m-cr

This model is a fine-tuned version of EleutherAI/gpt-neo-125m on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 0.05
num_epochs: 1
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
3.5916	0.0468	500	3.6148
3.3964	0.0935	1000	3.3468
3.2520	0.1403	1500	3.1837
3.1113	0.1871	2000	3.0535
2.9765	0.2338	2500	2.9742
2.9921	0.2806	3000	2.9028
2.8728	0.3274	3500	2.8260
2.7868	0.3741	4000	2.7799
2.7398	0.4209	4500	2.7301
2.7050	0.4677	5000	2.6818
2.6500	0.5145	5500	2.6310
2.6400	0.5612	6000	2.5810
2.5926	0.6080	6500	2.5462
2.5747	0.6548	7000	2.5108
2.5278	0.7015	7500	2.4811
2.4708	0.7483	8000	2.4537
2.4534	0.7951	8500	2.4327
2.4443	0.8418	9000	2.4178
2.3745	0.8886	9500	2.4070
2.4723	0.9354	10000	2.4018
2.4400	0.9821	10500	2.4002

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

(182)

this model