mookiezi/Discord-Dialogues
Viewer β’ Updated β’ 7.3M β’ 593 β’ 17
|
Micro Language Model Attention-Free β’ MLP-Only β’ Byte-Level β’ Conversational |
MicroMixer-2-1M-discord-dialogues is a ~1M parameter MLP-Mixer language model trained on Discord conversation data. V4 introduces DropPath regularization, label smoothing, and padding-aware loss for better training stability. It generates conversational text in a User/Assistant format without using any attention mechanisms.
graph TD
A[Byte Input] --> B[Token Embedding]
B --> C[RoPE Position Encoding]
C --> D[MicroMixerLayer Γ5]
D --> E[LayerNorm]
E --> F[LM Head]
F --> G[Byte Output]
style A fill:#007BFF,color:#fff
style G fill:#00D620,color:#fff
style D fill:#AE00FF,color:#fff
| Parameter | Value |
|---|---|
| Total Parameters | 1,016,204 |
| Hidden Dimension | 168 |
| Hyper Hidden Dimension | 84 |
| Channel MLP Dimension | 448 |
| Number of Layers | 5 |
| Max Sequence Length | 4096 |
| Vocabulary Size | 256 (Byte-level) |
| DropPath Rate | 0.1 |
| Label Smoothing | 0.1 |
βββββββββββββββββββββββββββββββββββββββββββββββ
β MicroMixerLayer β
β βββββββββββββββββββββββββββββββββββββββ β
β β LayerNorm β HyperMixing β Residual β β β Token Mixing
β βββββββββββββββββββββββββββββββββββββββ€ β
β β LayerNorm β MlpBlock β Residual β β β Channel Mixing
β βββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββ
Linear β GELU β LinearUser: Hello, how are you?
Assistant: Its not a fun of but I can't really find me be on a heread long collecting...
User: What is your favorite color?
Assistant: Not even worth to get voice the time to up them the stuff inel that was i fever...
User: Tell me a joke.
Assistant: I can always sure when your start to and in the mainse of I had say again...
| Epoch | Train Loss | Train PPL | Val Loss | Val PPL |
|---|---|---|---|---|
| 1 | 2.89 | 18.04 | 2.68 | 14.65 |
| 2 | 2.57 | 13.05 | 2.63 | 13.88 |
| 3 | 2.54 | 12.68 | 2.62 | 13.73 |
Dataset: Discord-Dialogues
import torch
from huggingface_hub import hf_hub_download
from src.model import MicroMixer, MicroMixerConfig
from src.tokenizer import ByteTokenizer
# Clone the repository first:
# git clone https://github.com/llaa33219/MicroMixer-2.git
# cd MicroMixer-2
config = MicroMixerConfig(
max_seq_len=4096,
hidden_dim=168,
hyper_hidden_dim=84,
channel_mlp_dim=448,
num_layers=5,
)
model = MicroMixer(config)
weights_path = hf_hub_download("llaa33219/MicroMixer-2-1M-discord-dialogues", "model.pt")
model.load_state_dict(torch.load(weights_path, map_location="cpu"))
model.eval()
tokenizer = ByteTokenizer()
input_ids = torch.tensor([tokenizer.encode("User: Hello
Assistant:")])
with torch.no_grad():
output = model.generate(input_ids, max_new_tokens=128, temperature=0.7, top_k=40)
print(tokenizer.decode(output[0].tolist()))
| Limitation | Description |
|---|---|
| Small Model Size | Only ~1M parameters |
| Grammar Issues | Generated text has grammatical errors |
| Repetitive Patterns | Tends to repeat learned phrases |
| Limited Knowledge | Trained only on Discord conversations |
Part of the MicroMixer-2 research project