MinT: Managed Infrastructure for Training and Serving Millions of LLMs Paper • 2605.13779 • Published 8 days ago • 216
SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training Paper • 2605.08738 • Published 12 days ago • 13
Learning, Fast and Slow: Towards LLMs That Adapt Continually Paper • 2605.12484 • Published 9 days ago • 17
Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs Paper • 2605.12460 • Published 9 days ago • 17
EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents Paper • 2605.13941 • Published 8 days ago • 23
TextLDM: Language Modeling with Continuous Latent Diffusion Paper • 2605.07748 • Published 13 days ago • 26
Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models Paper • 2605.07721 • Published 13 days ago • 29
SEIF: Self-Evolving Reinforcement Learning for Instruction Following Paper • 2605.07465 • Published 13 days ago • 29
TIDE: Token-Informed Depth Execution for Per-Token Early Exit in LLM Inference Paper • 2603.21365 • Published Mar 22 • 2
kshitijthakkar/deepseek-v4-mini-300M-from-flash Text Generation • 0.3B • Updated 15 days ago • 207 • 5