Update spaCy pipeline

0e239b7 verified over 2 years ago

2.65 kB

tags:
  - spacy
  - token-classification
language:
  - sr
license: cc-by-sa-3.0
model-index:
  - name: sr_pln_tesla_j355
    results:
      - task:
          name: NER
          type: token-classification
        metrics:
          - name: NER Precision
            type: precision
            value: 0.9563926016
          - name: NER Recall
            type: recall
            value: 0.9584629955
          - name: NER F Score
            type: f_score
            value: 0.9574266793
      - task:
          name: TAG
          type: token-classification
        metrics:
          - name: TAG (XPOS) Accuracy
            type: accuracy
            value: 0.9847113194
      - task:
          name: LEMMA
          type: token-classification
        metrics:
          - name: Lemma Accuracy
            type: accuracy
            value: 0.9833528137

sr_pln_tesla_j355 is a spaCy model meticulously fine-tuned for Part-of-Speech Tagging, Lemmatization, and Named Entity Recognition in Serbian language texts. This advanced model incorporates a transformer layer based on Jerteh-355, enhancing its analytical capabilities. It is proficient in identifying 7 distinct categories of entities: PERS (persons), ROLE (professions), DEMO (demonyms), ORG (organizations), LOC (locations), WORK (artworks), and EVENT (events). Detailed information about these categories is available in the accompanying table. The development of this model has been made possible through the support of the Science Fund of the Republic of Serbia, under grant #7276, for the project 'Text Embeddings - Serbian Language Applications - TESLA'.

Feature	Description
Name	`sr_pln_tesla_j355`
Version	`1.0.0`
spaCy	`>=3.7.2,<3.8.0`
Default Pipeline	`transformer`, `tagger`, `trainable_lemmatizer`, `ner`
Components	`transformer`, `tagger`, `trainable_lemmatizer`, `ner`
Vectors	0 keys, 0 unique vectors (0 dimensions)
Sources	n/a
License	`CC BY-SA 3.0`
Author	Milica Ikonić Nešić, Saša Petalinkar, Mihailo Škorić, Ranka Stanković

Label Scheme

View label scheme (23 labels for 2 components)

Component	Labels
`tagger`	`ADJ`, `ADP`, `ADV`, `AUX`, `CCONJ`, `DET`, `INTJ`, `NOUN`, `NUM`, `PART`, `PRON`, `PROPN`, `PUNCT`, `SCONJ`, `VERB`, `X`
`ner`	`DEMO`, `EVENT`, `LOC`, `ORG`, `PERS`, `ROLE`, `WORK`

Accuracy

Type	Score
`TAG_ACC`	98.47
`LEMMA_ACC`	98.34
`ENTS_F`	95.74
`ENTS_P`	95.64
`ENTS_R`	95.85
`TRANSFORMER_LOSS`	183572.28
`TAGGER_LOSS`	63121.95
`TRAINABLE_LEMMATIZER_LOSS`	99749.38
`NER_LOSS`	40508.31