Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
- Website
- Community
- Solutions
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2507.04009

MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

Paper • 2405.07526 • Published May 13, 2024 • 21
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Paper • 2405.15613 • Published May 24, 2024 • 17
A Touch, Vision, and Language Dataset for Multimodal Alignment

Paper • 2402.13232 • Published Feb 20, 2024 • 17
How Do Large Language Models Acquire Factual Knowledge During Pretraining?

Paper • 2406.11813 • Published Jun 17, 2024 • 31

Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Paper • 2507.04009 • Published Jul 5, 2025 • 55
FineVision: Open Data Is All You Need

Paper • 2510.17269 • Published Oct 20, 2025 • 81
Kronos: A Foundation Model for the Language of Financial Markets

Paper • 2508.02739 • Published Aug 2, 2025 • 35
VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images

Paper • 2604.09531 • Published Apr 10 • 8

MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds

Paper • 2508.14879 • Published Aug 20, 2025 • 69
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Paper • 2507.04009 • Published Jul 5, 2025 • 55

A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

Paper • 2508.18106 • Published Aug 25, 2025 • 350
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems

Paper • 2411.02959 • Published Nov 5, 2024 • 71
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Paper • 2507.04009 • Published Jul 5, 2025 • 55
MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm

Paper • 2506.05218 • Published Jun 5, 2025 • 3

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17, 2025 • 51
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 341
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4, 2025 • 276
DINOv3

Paper • 2508.10104 • Published Aug 13, 2025 • 308

Project Ideas 2026

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 137
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Paper • 2512.08765 • Published Dec 9, 2025 • 134
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

Paper • 2512.04677 • Published Dec 4, 2025 • 178
LongCat-Image Technical Report

Paper • 2512.07584 • Published Dec 8, 2025 • 25

Step-Audio-R1 Technical Report

Paper • 2511.15848 • Published Nov 19, 2025 • 59
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

Paper • 2410.17799 • Published Oct 23, 2024 • 13
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Paper • 2507.04009 • Published Jul 5, 2025 • 55

Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Paper • 2507.04009 • Published Jul 5, 2025 • 55

Paper2Agent: Reimagining Research Papers As Interactive and Reliable AI Agents

Paper • 2509.06917 • Published Sep 8, 2025 • 44
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Paper • 2507.04009 • Published Jul 5, 2025 • 55
WebDancer: Towards Autonomous Information Seeking Agency

Paper • 2505.22648 • Published May 28, 2025 • 33

DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning

Paper • 2504.07128 • Published Apr 2, 2025 • 87
BM25S: Orders of magnitude faster lexical search via eager sparse scoring

Paper • 2407.03618 • Published Jul 4, 2024 • 14
Deep Think with Confidence

Paper • 2508.15260 • Published Aug 21, 2025 • 91
R-Zero: Self-Evolving Reasoning LLM from Zero Data

Paper • 2508.05004 • Published Aug 7, 2025 • 132

MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

Paper • 2405.07526 • Published May 13, 2024 • 21
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Paper • 2405.15613 • Published May 24, 2024 • 17
A Touch, Vision, and Language Dataset for Multimodal Alignment

Paper • 2402.13232 • Published Feb 20, 2024 • 17
How Do Large Language Models Acquire Factual Knowledge During Pretraining?

Paper • 2406.11813 • Published Jun 17, 2024 • 31

Project Ideas 2026

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 137
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Paper • 2512.08765 • Published Dec 9, 2025 • 134
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

Paper • 2512.04677 • Published Dec 4, 2025 • 178
LongCat-Image Technical Report

Paper • 2512.07584 • Published Dec 8, 2025 • 25

Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Paper • 2507.04009 • Published Jul 5, 2025 • 55
FineVision: Open Data Is All You Need

Paper • 2510.17269 • Published Oct 20, 2025 • 81
Kronos: A Foundation Model for the Language of Financial Markets

Paper • 2508.02739 • Published Aug 2, 2025 • 35
VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images

Paper • 2604.09531 • Published Apr 10 • 8

Step-Audio-R1 Technical Report

Paper • 2511.15848 • Published Nov 19, 2025 • 59
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

Paper • 2410.17799 • Published Oct 23, 2024 • 13
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Paper • 2507.04009 • Published Jul 5, 2025 • 55

MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds

Paper • 2508.14879 • Published Aug 20, 2025 • 69
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Paper • 2507.04009 • Published Jul 5, 2025 • 55

Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Paper • 2507.04009 • Published Jul 5, 2025 • 55

A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

Paper • 2508.18106 • Published Aug 25, 2025 • 350
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems

Paper • 2411.02959 • Published Nov 5, 2024 • 71
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Paper • 2507.04009 • Published Jul 5, 2025 • 55
MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm

Paper • 2506.05218 • Published Jun 5, 2025 • 3

Paper2Agent: Reimagining Research Papers As Interactive and Reliable AI Agents

Paper • 2509.06917 • Published Sep 8, 2025 • 44
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Paper • 2507.04009 • Published Jul 5, 2025 • 55
WebDancer: Towards Autonomous Information Seeking Agency

Paper • 2505.22648 • Published May 28, 2025 • 33

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17, 2025 • 51
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 341
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4, 2025 • 276
DINOv3

Paper • 2508.10104 • Published Aug 13, 2025 • 308

DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning

Paper • 2504.07128 • Published Apr 2, 2025 • 87
BM25S: Orders of magnitude faster lexical search via eager sparse scoring

Paper • 2407.03618 • Published Jul 4, 2024 • 14
Deep Think with Confidence

Paper • 2508.15260 • Published Aug 21, 2025 • 91
R-Zero: Self-Evolving Reasoning LLM from Zero Data

Paper • 2508.05004 • Published Aug 7, 2025 • 132

Previous
1
2
3
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs