Post
51
We are excited to release MiniBananaMind-v2-9M
MiniBananaMind-v2-9M is a tiny causal language model trained fully from scratch on FineWeb-Edu.
It has only ~8.9M parameters, but was trained for ~3.54B tokens after retokenization using a custom 8k byte-level BPE tokenizer.
We trained it on a RTX 5070 Ti in just 4h 34m!
Go check it out at BananaMind/MiniBananaMind-v2-9M
MiniBananaMind-v2-9M is a tiny causal language model trained fully from scratch on FineWeb-Edu.
It has only ~8.9M parameters, but was trained for ~3.54B tokens after retokenization using a custom 8k byte-level BPE tokenizer.
We trained it on a RTX 5070 Ti in just 4h 34m!
Go check it out at BananaMind/MiniBananaMind-v2-9M