farpluto
/

zubenelgenubi-124m

Text Generation

knowledge-distillation

symbolic-reasoning

Model card Files Files and versions

124M GPT with Symbolic Reasoning Distillation

A 124M-parameter GPT-2 trained from scratch on FineWeb-Edu with knowledge distillation from SmolLM-135M-Instruct.

Component	Value
Parameters	~124M
Layers	12
Heads	12
Embedding dim	768
Context	512
Loss	0.5 CE + 0.5 KL
Hardware	1x A100
Time	~75 min
Tokens	327,680,000
Best loss	326.0111

Downloads last month: 11

Safetensors

Model size

0.1B params

Tensor type

F32

·

Dataset used to train farpluto/zubenelgenubi-124m

Collection including farpluto/zubenelgenubi-124m

Zubenelgenubi

3 items • Updated 8 days ago