Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Ujjwal-Tyagi 's Collections
Prompts Collections
Distillation Datasets
Coding Datasets
Best Small LLMs for finetuning

Coding Datasets

updated 10 days ago

These are the best coding corpuses to make the LLM more stronger to surpass proprietary ones, basically it can be used in both post and pre training.

Upvote
1

  • Ujjwal-Tyagi/gitee

    Viewer • Updated 10 days ago • 819M • 95

  • Ujjwal-Tyagi/gitverse

    Viewer • Updated 10 days ago • 2.8M • 20

  • Ujjwal-Tyagi/jihulab

    Viewer • Updated 10 days ago • 1.85M • 20

  • Ujjwal-Tyagi/moshub

    Updated 10 days ago • 20

  • Ujjwal-Tyagi/gitflic

    Viewer • Updated 10 days ago • 5.98M • 26

  • Ujjwal-Tyagi/notabug

    Viewer • Updated 10 days ago • 12.6M • 34

  • Ujjwal-Tyagi/gitgud

    Viewer • Updated 10 days ago • 16.3M • 34

  • Ujjwal-Tyagi/gitcode

    Viewer • Updated 10 days ago • 48.1M • 45

  • Ujjwal-Tyagi/google-code-archive

    Viewer • Updated 10 days ago • 65.8M • 55

  • Ujjwal-Tyagi/Cpp

    Updated 10 days ago • 11

  • Ujjwal-Tyagi/C

    Updated 10 days ago • 11

  • Ujjwal-Tyagi/Python

    Updated 10 days ago • 12

  • Ujjwal-Tyagi/Java-Code-Large

    Viewer • Updated 10 days ago • 10.9M • 492

  • Ujjwal-Tyagi/JavaScript-Code-Large

    Viewer • Updated 10 days ago • 2.64M • 574

  • Ujjwal-Tyagi/PHP-Code-Large

    Viewer • Updated 10 days ago • 8.07M • 129
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs