17 18 8

khtsly

AI & ML interests

None yet

Recent Activity

upvoted a paper 3 days ago

MiniMax Sparse Attention

upvoted a paper 4 days ago

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

upvoted a paper 5 days ago

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

View all activity

Organizations

None yet

upvoted a paper 3 days ago

MiniMax Sparse Attention

Paper • 2606.13392 • Published 4 days ago • 126

upvoted a paper 4 days ago

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

Paper • 2606.12397 • Published 5 days ago • 84

upvoted a paper 5 days ago

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Paper • 2606.11052 • Published 6 days ago • 15

upvoted a paper 6 days ago

FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention

Paper • 2606.09079 • Published 7 days ago • 61

New activity in sapientinc/HRM-Text-1B 9 days ago

Hrm can't calculate 2+2

#8 opened 11 days ago by

Xhub1880

commented a paper 11 days ago

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 516 •

upvoted 2 papers 11 days ago

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 516

Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding

Paper • 2605.29707 • Published 18 days ago • 145

upvoted a paper 12 days ago

dMoE: dLLMs with Learnable Block Experts

Paper • 2605.30876 • Published 17 days ago • 36

upvoted a paper 13 days ago

NITP: Next Implicit Token Prediction for LLM Pre-training

Paper • 2605.24956 • Published 22 days ago • 35

upvoted a paper 21 days ago

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

Paper • 2605.23901 • Published 24 days ago • 13

upvoted 2 papers 23 days ago

Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

Paper • 2605.11609 • Published May 12 • 195

HRM-Text: Efficient Pretraining Beyond Scaling

Paper • 2605.20613 • Published 26 days ago • 315

upvoted a paper 24 days ago

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Paper • 2605.22791 • Published 25 days ago • 31

upvoted a paper 25 days ago

Generative Recursive Reasoning

Paper • 2605.19376 • Published 26 days ago • 30

liked a model about 2 months ago

khtsly/luau-coder-preview-28B-A3B-noft

Text Generation • 28B • Updated Apr 26 • 82 • 2

published a model about 2 months ago

khtsly/luau-coder-preview-28B-A3B-noft

Text Generation • 28B • Updated Apr 26 • 82 • 2

updated a model about 2 months ago

khtsly/luau-coder-preview-28B-A3B-noft

Text Generation • 28B • Updated Apr 26 • 82 • 2

updated a dataset about 2 months ago

khtsly/roblox_docs_corpus_text

Viewer • Updated Apr 23 • 1.55k • 18 • 1

New activity in Jackrong/Qwopus-GLM-18B-Merged-GGUF about 2 months ago

merging problem

👀 1

#5 opened about 2 months ago by

khtsly

khtsly

AI & ML interests

Recent Activity

Organizations

khtsly's activity

Hrm can't calculate 2+2

merging problem