mult_tf

This model is a fine-tuned version of microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.5180
Accuracy: 0.8364
F1: 0.8358
Precision: 0.8355
Recall: 0.8364
Roc Auc: 0.9896

Model description

mult_tf is a fine-tuned [PubMedBERT](https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base -uncased-abstract-fulltext) model for 17-class medical specialty classification of biomedical text.
It distinguishes between 11 Internal Medicine sub-specialties and 6 other medical disciplines, trained on 300,000 PubMed article titles using journal provenance as a distant supervision
signal. No manual annotation was used.

Companion to the binary classifier
tgamstaetter/im-bin-tf-abstr.

Intended uses & limitations

Fine-grained specialty classification of biomedical abstracts or titles
- Research on multiclass distant supervision in biomedical NLP
Not intended for: clinical decision support, diagnostic use, or any safety-critical application.

Training and evaluation data

300,000 PubMed article titles from 77 medical journals. Labels from journal editorial scope
(distant supervision).

17 classes:

Class label	Specialty
`angio`	Angiology
`cardio`	Cardiology
`endo`	Endocrinology
`gastro`	Gastroenterology
`geri`	Geriatrics
`hemato`	Hematology
`infect`	Infectiology
`intens`	Intensive Care Medicine
`nephro`	Nephrology
`pulmo`	Pulmonology
`rheu`	Rheumatology
`anest`	Anesthesiology
`gyn`	Gynecology
`neuro`	Neurology
`oto`	Otorhinolaryngology
`psych`	Psychiatry
`surgery`	Surgery

Dataset: Internal medicine and other specialties — Kaggle

Training procedure

Fine-tuned from microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext:

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 640
eval_batch_size: 1280
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 4

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	F1	Precision	Recall	Roc Auc
No log	1.0	357	0.5694	0.8249	0.8243	0.8245	0.8249	0.9875
0.5397	2.0	714	0.5324	0.8324	0.8312	0.8313	0.8324	0.9890
0.523	3.0	1071	0.5193	0.8354	0.8348	0.8346	0.8354	0.9895
0.523	4.0	1428	0.5180	0.8364	0.8358	0.8355	0.8364	0.9896

Evaluation results

Evaluated on a held-out test set of 100,000 titles:

Metric	Value
Accuracy	0.835
Macro F1	0.834
Macro Precision	0.836
Macro Recall	0.835
ROC-AUC (macro OvR)	0.903

Lowest per-class F1 scores: intensive care medicine (0.670), geriatrics (0.683),
angiology (0.704) — reflecting known clinical content overlap with adjacent specialties.

Limitations

Trained on article titles only.
Label noise at disciplinary boundaries (e.g., intensive care / anesthesiology, angiology / cardiology) is inherent to the distant supervision approach.
Evaluated on English-language text only.

Framework versions

Transformers 4.31.0
Pytorch 2.0.1+cu118
Datasets 2.13.1

Tokenizers 0.13.3

Citation

@misc{gamstaetter2023modelmc,
  author       = {Gamstaetter, Thomas},                                                          
  title        = {mult\_tf: Fine-tuned {PubMedBERT} for multiclass medical specialty
classification},                                                                                 
  year         = {2023},
  howpublished = {Hugging Face},                                                                 
  url          = {https://huggingface.co/tgamstaetter/mult_tf}
}

Associated preregistration: OSF — DOI 10.17605/OSF.IO/XFDBV

Downloads last month: 52

Model tree for tgamstaetter/mult_tf

Base model

microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext

Finetuned

(138)

this model