mult_tf
This model is a fine-tuned version of microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.5180
- Accuracy: 0.8364
- F1: 0.8358
- Precision: 0.8355
- Recall: 0.8364
- Roc Auc: 0.9896
Model description
mult_tf is a fine-tuned [PubMedBERT](https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base
-uncased-abstract-fulltext)
model for 17-class medical specialty classification of biomedical text.
It distinguishes between 11 Internal Medicine sub-specialties and 6 other medical disciplines,
trained on 300,000 PubMed article titles using journal provenance as a distant supervision
signal. No manual annotation was used.
Companion to the binary classifier
tgamstaetter/im-bin-tf-abstr.
Intended uses & limitations
Fine-grained specialty classification of biomedical abstracts or titles
- Research on multiclass distant supervision in biomedical NLP
Not intended for: clinical decision support, diagnostic use, or any safety-critical application.
Training and evaluation data
300,000 PubMed article titles from 77 medical journals. Labels from journal editorial scope
(distant supervision).
17 classes:
| Class label | Specialty |
|---|---|
angio |
Angiology |
cardio |
Cardiology |
endo |
Endocrinology |
gastro |
Gastroenterology |
geri |
Geriatrics |
hemato |
Hematology |
infect |
Infectiology |
intens |
Intensive Care Medicine |
nephro |
Nephrology |
pulmo |
Pulmonology |
rheu |
Rheumatology |
anest |
Anesthesiology |
gyn |
Gynecology |
neuro |
Neurology |
oto |
Otorhinolaryngology |
psych |
Psychiatry |
surgery |
Surgery |
Dataset: Internal medicine and other specialties — Kaggle
Training procedure
Fine-tuned from microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext:
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 640
- eval_batch_size: 1280
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 4
Training results
| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall | Roc Auc |
|---|---|---|---|---|---|---|---|---|
| No log | 1.0 | 357 | 0.5694 | 0.8249 | 0.8243 | 0.8245 | 0.8249 | 0.9875 |
| 0.5397 | 2.0 | 714 | 0.5324 | 0.8324 | 0.8312 | 0.8313 | 0.8324 | 0.9890 |
| 0.523 | 3.0 | 1071 | 0.5193 | 0.8354 | 0.8348 | 0.8346 | 0.8354 | 0.9895 |
| 0.523 | 4.0 | 1428 | 0.5180 | 0.8364 | 0.8358 | 0.8355 | 0.8364 | 0.9896 |
Evaluation results
Evaluated on a held-out test set of 100,000 titles:
| Metric | Value |
|---|---|
| Accuracy | 0.835 |
| Macro F1 | 0.834 |
| Macro Precision | 0.836 |
| Macro Recall | 0.835 |
| ROC-AUC (macro OvR) | 0.903 |
Lowest per-class F1 scores: intensive care medicine (0.670), geriatrics (0.683),
angiology (0.704) — reflecting known clinical content overlap with adjacent specialties.
Limitations
- Trained on article titles only.
- Label noise at disciplinary boundaries (e.g., intensive care / anesthesiology, angiology / cardiology) is inherent to the distant supervision approach.
- Evaluated on English-language text only.
Framework versions
Transformers 4.31.0
Pytorch 2.0.1+cu118
Datasets 2.13.1
Tokenizers 0.13.3
Citation
@misc{gamstaetter2023modelmc, author = {Gamstaetter, Thomas}, title = {mult\_tf: Fine-tuned {PubMedBERT} for multiclass medical specialty classification}, year = {2023}, howpublished = {Hugging Face}, url = {https://huggingface.co/tgamstaetter/mult_tf} }Associated preregistration: OSF — DOI 10.17605/OSF.IO/XFDBV
- Downloads last month
- 52