Token Classification
SpanMarker
Safetensors
ner
named-entity-recognition
generated_from_span_marker_trainer
Eval Results (legacy)
Instructions to use iahlt/span-marker-alephbert-small-nemo-mt-he with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- SpanMarker
How to use iahlt/span-marker-alephbert-small-nemo-mt-he with SpanMarker:
from span_marker import SpanMarkerModel model = SpanMarkerModel.from_pretrained("iahlt/span-marker-alephbert-small-nemo-mt-he") - Notebooks
- Google Colab
- Kaggle
| library_name: span-marker | |
| tags: | |
| - span-marker | |
| - token-classification | |
| - ner | |
| - named-entity-recognition | |
| - generated_from_span_marker_trainer | |
| datasets: | |
| - imvladikon/nemo_corpus | |
| metrics: | |
| - precision | |
| - recall | |
| - f1 | |
| widget: | |
| - text: אחר כך הצטרף ל דאלאס מאווריקס מ ה אנ.בי.איי ו חזר לשחק ב אירופה ב ספרד ב מדי | |
| קאחה בילבאו ו חירונה | |
| - text: ב קיץ 1982 ניסה טל ברודי (אז עוזר ה מאמן) להחתימו, אבל בריאנט, ש סבתו יהודיה, | |
| חתם אז ב פורד קאנטו ו זכה עמ היא ב אותה עונה ב גביע אירופה ל אלופות. | |
| - text: יו"ר ועדת ה נוער נתן סלובטיק אמר ש ה שחקנים של אנחנו לא משתלבים ב אירופה. | |
| - text: ב ה סגל ש יתכנס מחר אחר ה צהריים ל מחנה אימונים ב שפיים 17 שחקנים, כולל מוזמן | |
| חדש שירן אדירי מ מכבי תל אביב. | |
| - text: 'תוצאות אחרות: טורינו 2 (מורלו עצמי, מולר) לצה 0; קאליארי 0 לאציו 1 (פסטה, | |
| שער עצמי); פיורנטינה 2 (נאפי, פאציונה) גנואה 2 (אורלאנדו, שקוראווי).' | |
| pipeline_tag: token-classification | |
| model-index: | |
| - name: SpanMarker | |
| results: | |
| - task: | |
| type: token-classification | |
| name: Named Entity Recognition | |
| dataset: | |
| name: Unknown | |
| type: imvladikon/nemo_corpus | |
| split: test | |
| metrics: | |
| - type: f1 | |
| value: 0.7338129496402878 | |
| name: F1 | |
| - type: precision | |
| value: 0.7577142857142857 | |
| name: Precision | |
| - type: recall | |
| value: 0.7113733905579399 | |
| name: Recall | |
| # SpanMarker | |
| This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model trained on the [imvladikon/nemo_corpus](https://huggingface.co/datasets/imvladikon/nemo_corpus) dataset that can be used for Named Entity Recognition. | |
| ## Model Details | |
| ### Model Description | |
| - **Model Type:** SpanMarker | |
| <!-- - **Encoder:** [Unknown](https://huggingface.co/unknown) --> | |
| - **Maximum Sequence Length:** 512 tokens | |
| - **Maximum Entity Length:** 100 words | |
| - **Training Dataset:** [imvladikon/nemo_corpus](https://huggingface.co/datasets/imvladikon/nemo_corpus) | |
| <!-- - **Language:** Unknown --> | |
| <!-- - **License:** Unknown --> | |
| ### Model Sources | |
| - **Repository:** [SpanMarker on GitHub](https://github.com/tomaarsen/SpanMarkerNER) | |
| - **Thesis:** [SpanMarker For Named Entity Recognition](https://raw.githubusercontent.com/tomaarsen/SpanMarkerNER/main/thesis.pdf) | |
| ### Model Labels | |
| | Label | Examples | | |
| |:------|:------------------------------------------------| | |
| | ANG | "יידיש", "גרמנית", "אנגלית" | | |
| | DUC | "דינמיט", "סובארו", "מרצדס" | | |
| | EVE | "מצדה", "הצהרת בלפור", "ה שואה" | | |
| | FAC | "ברזילי", "כלא עזה", "תל - ה שומר" | | |
| | GPE | "ה שטחים", "שפרעם", "רצועת עזה" | | |
| | LOC | "שייח רדואן", "גיבאליה", "חאן יונס" | | |
| | ORG | "כך", "ה ארץ", "מרחב ה גליל" | | |
| | PER | "רמי רהב", "נימר חוסיין", "איברהים נימר חוסיין" | | |
| | WOA | "קיטש ו מוות", "קדיש", "ה ארץ" | | |
| ## Evaluation | |
| ### Metrics | |
| | Label | Precision | Recall | F1 | | |
| |:--------|:----------|:-------|:-------| | |
| | **all** | 0.7577 | 0.7114 | 0.7338 | | |
| | ANG | 0.0 | 0.0 | 0.0 | | |
| | DUC | 0.0 | 0.0 | 0.0 | | |
| | FAC | 0.0 | 0.0 | 0.0 | | |
| | GPE | 0.7085 | 0.8103 | 0.7560 | | |
| | LOC | 0.5714 | 0.1951 | 0.2909 | | |
| | ORG | 0.7460 | 0.6912 | 0.7176 | | |
| | PER | 0.8301 | 0.8052 | 0.8175 | | |
| | WOA | 0.0 | 0.0 | 0.0 | | |
| ## Uses | |
| ### Direct Use for Inference | |
| ```python | |
| from span_marker import SpanMarkerModel | |
| # Download from the 🤗 Hub | |
| model = SpanMarkerModel.from_pretrained("iahlt/span-marker-alephbert-small-nemo-mt-he") | |
| # Run inference | |
| entities = model.predict("יו\"ר ועדת ה נוער נתן סלובטיק אמר ש ה שחקנים של אנחנו לא משתלבים ב אירופה.") | |
| entities | |
| ``` | |
| ### Using spacy | |
| ```bash | |
| pip install spacy_udpipe | |
| ``` | |
| ```python | |
| import spacy | |
| from spacy.lang.he import Hebrew | |
| import spacy_udpipe | |
| spacy_udpipe.download("he") # download public udpipe model, but possible to use any your spacy model | |
| nlp = spacy_udpipe.load("he") | |
| nlp.add_pipe("span_marker", config={"model": "iahlt/span-marker-alephbert-small-nemo-mt-he"}) | |
| text = "יו\"ר ועדת הנוער נתן סלובטיק אמר שהשחקנים של אנחנו לא משתלבים באירופה." | |
| doc = nlp(text) | |
| print([(entity, entity.label_) for entity in doc.ents]) | |
| # [(ועדת הנוער, 'ORG'), (נתן סלובטיק, 'PER'), (אירופה, 'GPE')] | |
| ``` | |
| ## Training Details | |
| ### Training Set Metrics | |
| | Training set | Min | Median | Max | | |
| |:----------------------|:----|:--------|:----| | |
| | Sentence length | 1 | 25.4427 | 117 | | |
| | Entities per sentence | 0 | 1.2472 | 20 | | |
| ### Training Hyperparameters | |
| - learning_rate: 1e-05 | |
| - train_batch_size: 2 | |
| - eval_batch_size: 2 | |
| - seed: 42 | |
| - gradient_accumulation_steps: 2 | |
| - total_train_batch_size: 4 | |
| - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 | |
| - lr_scheduler_type: linear | |
| - lr_scheduler_warmup_ratio: 0.1 | |
| - num_epochs: 4 | |
| - mixed_precision_training: Native AMP | |
| ### Evaluation results | |
| | | 0 | | |
| |:-----------------------|-----------:| | |
| | eval_loss | 0.00487611 | | |
| | eval_overall_precision | 0.822917 | | |
| | eval_overall_recall | 0.791583 | | |
| | eval_overall_f1 | 0.806946 | | |
| | eval_overall_accuracy | 0.969029 | | |
| ### Test results | |
| | | 0 | | |
| |:-----------------------|-----------:| | |
| | test_loss | 0.00652107 | | |
| | test_overall_precision | 0.747289 | | |
| | test_overall_recall | 0.73927 | | |
| | test_overall_f1 | 0.743258 | | |
| | test_overall_accuracy | 0.960126 | | |
| ### Framework Versions | |
| - Python: 3.10.12 | |
| - SpanMarker: 1.5.0 | |
| - Transformers: 4.35.2 | |
| - PyTorch: 2.1.0+cu118 | |
| - Datasets: 2.15.0 | |
| - Tokenizers: 0.15.0 | |
| ## Citation | |
| ### BibTeX | |
| ``` | |
| @software{Aarsen_SpanMarker, | |
| author = {Aarsen, Tom}, | |
| license = {Apache-2.0}, | |
| title = {{SpanMarker for Named Entity Recognition}}, | |
| url = {https://github.com/tomaarsen/SpanMarkerNER} | |
| } | |
| ``` | |
| <!-- | |
| ## Glossary | |
| *Clearly define terms in order to be accessible across audiences.* | |
| --> | |
| <!-- | |
| ## Model Card Authors | |
| *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.* | |
| --> | |
| <!-- | |
| ## Model Card Contact | |
| *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.* | |
| --> |