pangpangxuan's picture

pangpangxuan

pangxuan

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

OpenClaw-RL: Train Any Agent Simply by Talking

upvoted a paper 1 day ago

Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders

upvoted a paper 3 days ago

How Far Can Unsupervised RLVR Scale LLM Training?

View all activity

Organizations

None yet

upvoted 2 papers 1 day ago

OpenClaw-RL: Train Any Agent Simply by Talking

Paper • 2603.10165 • Published 3 days ago • 78

Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders

Paper • 2603.06569 • Published 7 days ago • 101

upvoted a paper 3 days ago

How Far Can Unsupervised RLVR Scale LLM Training?

Paper • 2603.08660 • Published 4 days ago • 44

upvoted a paper 11 days ago

dLLM: Simple Diffusion Language Modeling

Paper • 2602.22661 • Published 15 days ago • 127

upvoted a paper 14 days ago

Query-focused and Memory-aware Reranker for Long Context Processing

Paper • 2602.12192 • Published 29 days ago • 56

upvoted 2 papers 17 days ago

Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models

Paper • 2602.12036 • Published 29 days ago • 91

Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs

Paper • 2602.10388 • Published about 1 month ago • 241

upvoted 2 papers 28 days ago

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation

Paper • 2602.12125 • Published 29 days ago • 58

The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies

Paper • 2602.09877 • Published about 1 month ago • 197

upvoted 5 papers about 1 month ago

WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning

Paper • 2602.04634 • Published Feb 4 • 96

CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty

Paper • 2601.22027 • Published Jan 29 • 83

MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods

Paper • 2601.21821 • Published Jan 29 • 60

TTCS: Test-Time Curriculum Synthesis for Self-Evolving

Paper • 2601.22628 • Published Jan 30 • 35

Kimi K2.5: Visual Agentic Intelligence

Paper • 2602.02276 • Published Feb 2 • 255

upvoted 2 papers about 2 months ago

X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests

Paper • 2601.06953 • Published Jan 11 • 45

RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes

Paper • 2601.05249 • Published Jan 8 • 47

upvoted 4 papers 2 months ago

Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization

Paper • 2601.05432 • Published Jan 8 • 169

The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning

Paper • 2601.06002 • Published Jan 9 • 56

Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards

Paper • 2601.06021 • Published Jan 9 • 47

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Paper • 2601.05242 • Published Jan 8 • 229