3 10

Dongwon Jo

dongwonjo

AI & ML interests

Efficient AI, Model Compression, Quantization, Pruning, Generative Model, Large Language Model, Diffusion

Recent Activity

upvoted a paper about 24 hours ago

Squeezing Large-Scale Diffusion Models for Mobile

upvoted a paper about 24 hours ago

SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

upvoted a paper about 24 hours ago

LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning

View all activity

Organizations

upvoted 4 papers about 24 hours ago

Squeezing Large-Scale Diffusion Models for Mobile

Paper • 2307.01193 • Published Jul 3, 2023 • 2

SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

Paper • 2402.09025 • Published Feb 14, 2024 • 10

LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning

Paper • 2510.14211 • Published Oct 16, 2025 • 9

Retrospective Sparse Attention for Efficient Long-Context Generation

Paper • 2508.09001 • Published Aug 12, 2025 • 3

authored 3 papers 6 days ago

Retrospective Sparse Attention for Efficient Long-Context Generation

Paper • 2508.09001 • Published Aug 12, 2025 • 3

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection

Paper • 2602.03216 • Published 7 days ago • 12

Squeezing Large-Scale Diffusion Models for Mobile

Paper • 2307.01193 • Published Jul 3, 2023 • 2

upvoted 2 papers 6 days ago

LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents

Paper • 2602.01053 • Published 9 days ago • 8

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection

Paper • 2602.03216 • Published 7 days ago • 12

submitted a paper to Daily Papers 6 days ago

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection

Paper • 2602.03216 • Published 7 days ago • 12

upvoted a paper 5 months ago

QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models

Paper • 2509.17428 • Published Sep 22, 2025 • 9

authored a paper 9 months ago

Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning

Paper • 2505.13866 • Published May 20, 2025 • 17

upvoted a paper 9 months ago

Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning

Paper • 2505.13866 • Published May 20, 2025 • 17

upvoted a paper about 1 year ago

Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models

Paper • 2406.12311 • Published Jun 18, 2024 • 8

authored a paper about 1 year ago

FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration

Paper • 2502.01068 • Published Feb 3, 2025 • 18

upvoted a paper about 1 year ago

FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration

Paper • 2502.01068 • Published Feb 3, 2025 • 18

commented a paper about 1 year ago

FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration

Paper • 2502.01068 • Published Feb 3, 2025 • 18 •

updated 3 models over 1 year ago

Dongwon Jo

AI & ML interests

Recent Activity

Organizations

dongwonjo's activity