Post
400
🏛️ Microsoft CodePlex Archive Dataset -
nyuuzyou/ms-codeplex-archive
Following the strong response to the Google Code Archive nyuuzyou/google-code-archive (thanks!), this release preserves another major historical repository: the Microsoft CodePlex Archive.
CodePlex served as Microsoft’s primary open-source hosting platform from 2006 to 2017. This dataset captures the distinct .NET and Windows-centric development ecosystem that flourished before the industry standardizing on GitHub.
Key Stats:
- 5,043,730 files from 38,087 repositories
- 3.6 GB compressed Parquet
- 91 programming languages (Heavily featuring C#, ASP.NET, and C++)
- Cleaned of binaries, build artifacts, and vendor directories (node_modules, packages)
- Includes platform-specific license metadata (Ms-PL, Ms-RL)
Following the strong response to the Google Code Archive nyuuzyou/google-code-archive (thanks!), this release preserves another major historical repository: the Microsoft CodePlex Archive.
CodePlex served as Microsoft’s primary open-source hosting platform from 2006 to 2017. This dataset captures the distinct .NET and Windows-centric development ecosystem that flourished before the industry standardizing on GitHub.
Key Stats:
- 5,043,730 files from 38,087 repositories
- 3.6 GB compressed Parquet
- 91 programming languages (Heavily featuring C#, ASP.NET, and C++)
- Cleaned of binaries, build artifacts, and vendor directories (node_modules, packages)
- Includes platform-specific license metadata (Ms-PL, Ms-RL)