In a Training Loop 🔄

19 3

Qianqian Xie

mistletoe111

AI & ML interests

None yet

Recent Activity

upvoted a paper 7 days ago

CoVEBench: Can Video Editing Models Handle Complex Instructions?

published a dataset 7 days ago

mistletoe111/webcoding

updated a dataset 9 days ago

mistletoe111/webcoding

View all activity

Organizations

upvoted a paper 7 days ago

CoVEBench: Can Video Editing Models Handle Complex Instructions?

Paper • 2606.08415 • Published 13 days ago • 48

published a dataset 7 days ago

mistletoe111/webcoding

Updated 9 days ago • 1.38k

updated a dataset 9 days ago

mistletoe111/webcoding

Updated 9 days ago • 1.38k

authored 2 papers 14 days ago

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

Paper • 2606.02060 • Published 19 days ago • 54

TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation

Paper • 2606.02320 • Published 19 days ago • 14

upvoted 3 papers 16 days ago

TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation

Paper • 2606.02320 • Published 19 days ago • 14

MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?

Paper • 2606.01993 • Published 18 days ago • 15

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

Paper • 2606.02060 • Published 19 days ago • 54

updated a dataset 16 days ago

mistletoe111/webcoding1

Updated 16 days ago • 749

published a dataset 16 days ago

mistletoe111/webcoding1

Updated 16 days ago • 749

upvoted a paper 29 days ago

Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

Paper • 2605.16928 • Published May 16 • 96

upvoted 2 papers about 1 month ago

OProver: A Unified Framework for Agentic Formal Theorem Proving

Paper • 2605.17283 • Published May 17 • 31

Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution

Paper • 2605.15301 • Published May 14 • 22

upvoted 2 papers about 2 months ago

DR^{3}-Eval: Towards Realistic and Reproducible Deep Research Evaluation

Paper • 2604.14683 • Published Apr 16 • 36

WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models

Paper • 2604.18224 • Published Apr 20 • 22

updated a dataset 2 months ago

NJU-LINK/DR3-Eval

Viewer • Updated Apr 20 • 100 • 2.78k • 2

authored 3 papers 2 months ago

MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues

Paper • 2510.17722 • Published Oct 20, 2025 • 20

IF-VidCap: Can Video Caption Models Follow Instructions?

Paper • 2510.18726 • Published Oct 21, 2025 • 27

DR$^{3}$-Eval: Towards Realistic and Reproducible Deep Research Evaluation

Paper • 2604.14683 • Published Apr 16 • 36

upvoted a paper 2 months ago

YuE: Scaling Open Foundation Models for Long-Form Music Generation

Paper • 2503.08638 • Published Mar 11, 2025 • 73

Qianqian Xie

AI & ML interests

Recent Activity

Organizations

mistletoe111's activity