mult_tf

This model is a fine-tuned version of microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5180
  • Accuracy: 0.8364
  • F1: 0.8358
  • Precision: 0.8355
  • Recall: 0.8364
  • Roc Auc: 0.9896

Model description

mult_tf is a fine-tuned [PubMedBERT](https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base -uncased-abstract-fulltext) model for 17-class medical specialty classification of biomedical text.
It distinguishes between 11 Internal Medicine sub-specialties and 6 other medical disciplines, trained on 300,000 PubMed article titles using journal provenance as a distant supervision
signal. No manual annotation was used.

Companion to the binary classifier
tgamstaetter/im-bin-tf-abstr.

Intended uses & limitations

  • Fine-grained specialty classification of biomedical abstracts or titles

    • Research on multiclass distant supervision in biomedical NLP

    Not intended for: clinical decision support, diagnostic use, or any safety-critical application.

Training and evaluation data

300,000 PubMed article titles from 77 medical journals. Labels from journal editorial scope
(distant supervision).

17 classes:

Class label Specialty
angio Angiology
cardio Cardiology
endo Endocrinology
gastro Gastroenterology
geri Geriatrics
hemato Hematology
infect Infectiology
intens Intensive Care Medicine
nephro Nephrology
pulmo Pulmonology
rheu Rheumatology
anest Anesthesiology
gyn Gynecology
neuro Neurology
oto Otorhinolaryngology
psych Psychiatry
surgery Surgery

Dataset: Internal medicine and other specialties — Kaggle

Training procedure

Fine-tuned from microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext:

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 640
  • eval_batch_size: 1280
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 4

Training results

Training Loss Epoch Step Validation Loss Accuracy F1 Precision Recall Roc Auc
No log 1.0 357 0.5694 0.8249 0.8243 0.8245 0.8249 0.9875
0.5397 2.0 714 0.5324 0.8324 0.8312 0.8313 0.8324 0.9890
0.523 3.0 1071 0.5193 0.8354 0.8348 0.8346 0.8354 0.9895
0.523 4.0 1428 0.5180 0.8364 0.8358 0.8355 0.8364 0.9896

Evaluation results

Evaluated on a held-out test set of 100,000 titles:

Metric Value
Accuracy 0.835
Macro F1 0.834
Macro Precision 0.836
Macro Recall 0.835
ROC-AUC (macro OvR) 0.903

Lowest per-class F1 scores: intensive care medicine (0.670), geriatrics (0.683),
angiology (0.704) — reflecting known clinical content overlap with adjacent specialties.

Limitations

  • Trained on article titles only.
  • Label noise at disciplinary boundaries (e.g., intensive care / anesthesiology, angiology / cardiology) is inherent to the distant supervision approach.
  • Evaluated on English-language text only.

Framework versions

  • Transformers 4.31.0

  • Pytorch 2.0.1+cu118

  • Datasets 2.13.1

  • Tokenizers 0.13.3

    Citation

    @misc{gamstaetter2023modelmc,
      author       = {Gamstaetter, Thomas},                                                          
      title        = {mult\_tf: Fine-tuned {PubMedBERT} for multiclass medical specialty
    classification},                                                                                 
      year         = {2023},
      howpublished = {Hugging Face},                                                                 
      url          = {https://huggingface.co/tgamstaetter/mult_tf}
    }                                                                                                
    

    Associated preregistration: OSF — DOI 10.17605/OSF.IO/XFDBV

Downloads last month
52
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tgamstaetter/mult_tf