Instructions to use atillaalkan/astroNLPy-ner with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use atillaalkan/astroNLPy-ner with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="atillaalkan/astroNLPy-ner")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("atillaalkan/astroNLPy-ner") model = AutoModelForTokenClassification.from_pretrained("atillaalkan/astroNLPy-ner") - Notebooks
- Google Colab
- Kaggle
astroNLPy-ner
Named entity recognition for astronomical observation reports (ATels, GCN
Circulars, TNS reports). Fine-tuned from adsabs/astroBERT
on the TDAC corpus (Time-Domain Astronomy Corpus) with 27 astrophysical
entity types.
This model is the NER component of the astroNLPy package, which also provides LLM-based coreference resolution and relation extraction for celestial objects.
Usage
With the astroNLPy package (recommended):
from astroNLPy.ner import NERModel
ner = NERModel.from_pretrained("atillaalkan/astroNLPy-ner")
tags = ner.predict_text("Swift observed GRS 1747-312 in the X-ray band.")
Or directly with transformers:
from transformers import pipeline
nlp = pipeline("token-classification", model="atillaalkan/astroNLPy-ner",
aggregation_strategy="simple")
print(nlp("We report a nova in M31 at R = 19.7 mag."))
Entity types
CelestialObject, CelestialRegion, CelestialObjectRegion, Telescope, Observatory, Instrument, Survey, Wavelength, Formula, ObservationalTechniques, Citation, Dataset, Database, Archive, Software, URL, Person, Organization, Collaboration, Location, Grant, Proposal, Event, Model, Identifier, Tag, TextGarbage.
Results (v0.1.0)
Single 80/10/10 holdout split (8 test documents), seqeval / IOB2:
| Metric | Value |
|---|---|
| Micro F1 | 0.52 |
| CelestialObject F1 | 0.96 |
| Person F1 | 0.92 |
| Wavelength F1 | 0.68 |
Single-split result; the micro-average is depressed by entity types absent from the small test set.
Training
- Base model:
adsabs/astroBERT - 10 epochs, batch size 8, learning rate 2e-5, IOB2 token classification
- Corpus: TDAC (74 documents, ~19k tokens)
Citation
Publication will come soon.
License
MIT
- Downloads last month
- 27
Model tree for atillaalkan/astroNLPy-ner
Base model
adsabs/astroBERT