-
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Paper • 1406.1078 • Published • 1 -
Distributed Representations of Sentences and Documents
Paper • 1405.4053 • Published -
Sequence to Sequence Learning with Neural Networks
Paper • 1409.3215 • Published • 3 -
Neural Machine Translation by Jointly Learning to Align and Translate
Paper • 1409.0473 • Published • 7
Collections
Discover the best community collections!
Collections including paper arxiv:2312.00752
-
NitroGen: An Open Foundation Model for Generalist Gaming Agents
Paper • 2601.02427 • Published • 46 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 322 -
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models
Paper • 2512.24165 • Published • 52 -
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting
Paper • 2601.02151 • Published • 113
-
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Paper • 2401.09417 • Published • 62 -
VMamba: Visual State Space Model
Paper • 2401.10166 • Published • 40 -
DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis
Paper • 2405.14224 • Published • 15 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 150
-
Attention Is All You Need
Paper • 1706.03762 • Published • 121 -
LLaMA: Open and Efficient Foundation Language Models
Paper • 2302.13971 • Published • 23 -
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 21 -
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
Paper • 2407.21770 • Published • 22
-
OpenClaw-RL: Train Any Agent Simply by Talking
Paper • 2603.10165 • Published • 152 -
Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights
Paper • 2603.12228 • Published • 12 -
Efficient Memory Management for Large Language Model Serving with PagedAttention
Paper • 2309.06180 • Published • 54 -
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs
Paper • 2410.16144 • Published • 5
-
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Paper • 2502.11089 • Published • 170 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 145 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 448 -
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 87
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 150 -
Elucidating the Design Space of Diffusion-Based Generative Models
Paper • 2206.00364 • Published • 18 -
GLU Variants Improve Transformer
Paper • 2002.05202 • Published • 5 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 156
-
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Paper • 2404.08801 • Published • 66 -
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Paper • 2404.07839 • Published • 48 -
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper • 2404.05892 • Published • 40 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 150
-
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Paper • 1406.1078 • Published • 1 -
Distributed Representations of Sentences and Documents
Paper • 1405.4053 • Published -
Sequence to Sequence Learning with Neural Networks
Paper • 1409.3215 • Published • 3 -
Neural Machine Translation by Jointly Learning to Align and Translate
Paper • 1409.0473 • Published • 7
-
OpenClaw-RL: Train Any Agent Simply by Talking
Paper • 2603.10165 • Published • 152 -
Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights
Paper • 2603.12228 • Published • 12 -
Efficient Memory Management for Large Language Model Serving with PagedAttention
Paper • 2309.06180 • Published • 54 -
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs
Paper • 2410.16144 • Published • 5
-
NitroGen: An Open Foundation Model for Generalist Gaming Agents
Paper • 2601.02427 • Published • 46 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 322 -
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models
Paper • 2512.24165 • Published • 52 -
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting
Paper • 2601.02151 • Published • 113
-
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Paper • 2502.11089 • Published • 170 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 145 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 448 -
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 87
-
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Paper • 2401.09417 • Published • 62 -
VMamba: Visual State Space Model
Paper • 2401.10166 • Published • 40 -
DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis
Paper • 2405.14224 • Published • 15 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 150
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 150 -
Elucidating the Design Space of Diffusion-Based Generative Models
Paper • 2206.00364 • Published • 18 -
GLU Variants Improve Transformer
Paper • 2002.05202 • Published • 5 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 156
-
Attention Is All You Need
Paper • 1706.03762 • Published • 121 -
LLaMA: Open and Efficient Foundation Language Models
Paper • 2302.13971 • Published • 23 -
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 21 -
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
Paper • 2407.21770 • Published • 22
-
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Paper • 2404.08801 • Published • 66 -
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Paper • 2404.07839 • Published • 48 -
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper • 2404.05892 • Published • 40 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 150