---
pretty_name: Diffusers PR Dataset
configs:
- config_name: issues
  data_files:
  - split: train
    path: issues.parquet
  default: true
- config_name: prs
  data_files:
  - split: train
    path: pull_requests.parquet
- config_name: issue_comments
  data_files:
  - split: train
    path: issue_comments.parquet
- config_name: pr_comments
  data_files:
  - split: train
    path: pr_comments.parquet
- config_name: pr_reviews
  data_files:
  - split: train
    path: reviews.parquet
- config_name: pr_files
  data_files:
  - split: train
    path: pr_files.parquet
- config_name: pr_diffs
  data_files:
  - split: train
    path: pr_diffs.parquet
- config_name: review_comments
  data_files:
  - split: train
    path: review_comments.parquet
- config_name: links
  data_files:
  - split: train
    path: links.parquet
- config_name: events
  data_files:
  - split: train
    path: events.parquet
- config_name: new_contributors
  data_files:
  - split: train
    path: new_contributors.parquet
---
---

# Diffusers PR Dataset

Normalized snapshots of issues, pull requests, comments, reviews, and linkage data from `huggingface/diffusers`.

Files:
- `issues.parquet`
- `pull_requests.parquet`
- `comments.parquet`
- `issue_comments.parquet` (derived view of issue discussion comments)
- `pr_comments.parquet` (derived view of pull request discussion comments)
- `reviews.parquet`
- `pr_files.parquet`
- `pr_diffs.parquet`
- `review_comments.parquet`
- `links.parquet`
- `events.parquet`
- `new_contributors.parquet`
- `new-contributors-report.json`
- `new-contributors-report.md`

Use:
- duplicate PR and issue analysis
- triage and ranking experiments
- eval set creation

Notes:
- latest snapshot: `20260528T043525Z`
- raw data only; no labels or moderation decisions
- PR metadata, file-level patch hunks, and full unified diffs are included
- full file contents for changed files are not included
