| | --- |
| | datasets: |
| | - JoaoJunior/python_java_dataset_APR |
| | tags: |
| | - APR |
| | - AI |
| | --- |
| | # Introduction |
| | This model, JoaoJunior/T5_APR_java_python_v4, is a fine-tuned version of the pre-trained CodeT5 model from Salesforce. The model is designed to understand and generate code, with a specific focus on bug fixing tasks in Python and Java languages. |
| |
|
| | # Description |
| | The CodeT5 model was introduced in the paper "CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation". This model leverages the semantics conveyed from the developer-assigned identifiers in the code, allowing for effective code understanding and generation tasks. |
| |
|
| | JoaoJunior/T5_APR_java_python_v4 was trained on the python_java_dataset_APR dataset, which contains pairs of bugged and fixed code in Python and Java. This dataset was created using the coconut_java2006 and coconut_python2010 datasets from the CoCoNuT project as its base. |
| | |
| | # Objective |
| | The primary objective of this model is to identify and fix bugs in Python and Java code. By fine-tuning the CodeT5 model on the python_java_dataset_APR dataset, this model aims to effectively learn the patterns and structures of these languages, enabling it to accurately detect and correct errors. |
| |
|
| | # References |
| | - CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation by Yue Wang, Weishi Wang, Shafiq Joty, Steven C.H. Hoi |
| | - python_java_dataset_APR: A dataset containing pairs of bugged and fixed code in Python and Java, created using the CoCoNuT project's coconut_java2006 and coconut_python2010 datasets |
| | - CoCoNuT: Combining Context-Aware Neural Translation Models using Ensemble for Program Repair |