fabiancpl
/

nlbse25_python

Text Classification

sentence-transformers

generated_from_setfit_trainer

text-embeddings-inference

Model card Files Files and versions

nlbse25_python / README.md

fabiancpl's picture

Update README.md

1b8f032 verified about 1 year ago

|

History Blame Contribute Delete

2.03 kB

	---
	tags:
	- setfit
	- sentence-transformers
	- text-classification
	- generated_from_setfit_trainer
	widget: []
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	pipeline_tag: text-classification
	library_name: setfit
	inference: true
	license: mit
	datasets:
	- NLBSE/nlbse25-code-comment-classification
	language:
	- en
	base_model:
	- sentence-transformers/all-MiniLM-L6-v2
	---

	# Python comment classifier

	This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Python code comment classification.

	The model has been trained using few-shot learning that involves:

	1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
	2. Training a classification head with features from the fine-tuned model.

	## Model Description

	- Model Type: SetFit
	- Classification head: [RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)

	## Sources

	- Repository: [GitHub](https://github.com/fabiancpl/sbert-comment-classification/)
	- Paper: [Evaluating the Performance and Efficiency of Sentence-BERT for Code Comment Classification](https://ieeexplore.ieee.org/document/11029440)
	- Dataset: [HF Dataset](https://huggingface.co/datasets/NLBSE/nlbse25-code-comment-classification)

	## How to use it

	First, install the depencies:

	```bash
	pip install setfit scikit-learn
	```

	Then, load the model and run inferences:

	```python
	from setfit import SetFitModel

	# Download from the 🤗 Hub
	model = SetFitModel.from_pretrained("fabiancpl/nlbse25_python")
	# Run inference
	preds = model("This function sorts a list of numbers.")
	```

	## Cite as

	```bibtex
	@inproceedings{11029440,
	author={Peña, Fabian C. and Herbold, Steffen},
	booktitle={2025 IEEE/ACM International Workshop on Natural Language-Based Software Engineering (NLBSE)},
	title={Evaluating the Performance and Efficiency of Sentence-BERT for Code Comment Classification},
	year={2025},
	pages={21-24},
	doi={10.1109/NLBSE66842.2025.00010}}
	```