Azerbaijani NLP Suite
Collection
Complete NLP toolkit for Azerbaijani language: 4 NER models benchmarked, GPT language model, and live demos. β’ 12 items β’ Updated
A state-of-the-art Named Entity Recognition (NER) system specifically designed for the Azerbaijani language, featuring multiple fine-tuned transformer models and a production-ready FastAPI deployment with an intuitive web interface.
Try the live demo: Named Entity Recognition Demo
Note: The server runs on a free tier and may take 1-2 minutes to initialize if inactive. Please be patient during startup.
graph TD
A[User Input] --> B[FastAPI Server]
B --> C[XLM-RoBERTa Model]
C --> D[Token Classification]
D --> E[Entity Aggregation]
E --> F[Label Mapping]
F --> G[JSON Response]
G --> H[Frontend Visualization]
subgraph "Model Pipeline"
C --> C1[Tokenization]
C1 --> C2[BERT Encoding]
C2 --> C3[Classification Head]
C3 --> D
end
subgraph "Entity Categories"
I[Person]
J[Location]
K[Organization]
L[Date/Time]
M[Government]
N[25 Total Categories]
end
F --> I
F --> J
F --> K
F --> L
F --> M
F --> N
flowchart LR
A[Azerbaijani NER Dataset] --> B[Data Preprocessing]
B --> C[Tokenization]
C --> D[Label Alignment]
subgraph "Model Training"
E[mBERT] --> F[Fine-tuning]
G[XLM-RoBERTa] --> F
H[XLM-RoBERTa Large] --> F
I[Azeri-Turkish BERT] --> F
F --> J[Model Evaluation]
end
D --> E
D --> G
D --> H
D --> I
J --> K[Best Model Selection]
K --> L[Hugging Face Hub]
L --> M[Production Deployment]
subgraph "Performance Metrics"
N[Precision: 76.44%]
O[Recall: 74.05%]
P[F1-Score: 75.22%]
end
J --> N
J --> O
J --> P
sequenceDiagram
participant U as User
participant F as Frontend
participant API as FastAPI
participant M as XLM-RoBERTa
participant HF as Hugging Face
U->>F: Enter Azerbaijani text
F->>API: POST /predict/
API->>M: Process text
M->>M: Tokenize input
M->>M: Generate predictions
M->>API: Return entity predictions
API->>API: Apply label mapping
API->>API: Group entities by type
API->>F: JSON response with entities
F->>U: Display highlighted entities
Note over M,HF: Model loaded from<br/>IsmatS/xlm-roberta-az-ner
.
βββ Dockerfile # Docker image configuration
βββ README.md # Project documentation
βββ fly.toml # Fly.io deployment configuration
βββ main.py # FastAPI application entry point
βββ models/ # Model-related files
β βββ NER_from_scratch.ipynb # Custom NER implementation notebook
β βββ README.md # Models documentation
β βββ XLM-RoBERTa.ipynb # XLM-RoBERTa training notebook
β βββ azeri-turkish-bert-ner.ipynb # Azeri-Turkish BERT training
β βββ mBERT.ipynb # mBERT training notebook
β βββ push_to_HF.py # Hugging Face upload script
β βββ train-00000-of-00001.parquet # Training data
β βββ xlm_roberta_large.ipynb # XLM-RoBERTa Large training
βββ requirements.txt # Python dependencies
βββ static/ # Frontend assets
β βββ app.js # Frontend logic
β βββ style.css # UI styling
βββ templates/ # HTML templates
βββ index.html # Main UI template
| Model | Parameters | F1-Score | Hugging Face | Status |
|---|---|---|---|---|
| mBERT Azerbaijani NER | 180M | 67.70% | β | Released |
| XLM-RoBERTa Azerbaijani NER | 125M | 75.22% | β | Production |
| XLM-RoBERTa Large Azerbaijani NER | 355M | 75.48% | β | Released |
| Azerbaijani-Turkish BERT Base NER | 110M | 73.55% | β | Released |
| Category | Category | Category |
|---|---|---|
| Person | Government | Law |
| Location | Date | Language |
| Organization | Time | Position |
| Facility | Money | Nationality |
| Product | Percentage | Disease |
| Event | Contact | Quantity |
| Art | Project | Cardinal |
| Proverb | Ordinal | Miscellaneous |
| Other |
| Model | F1-Score |
|---|---|
| mBERT | 67.70% |
| XLM-RoBERTa Base | 75.22% |
| XLM-RoBERTa Large | 75.48% |
| Azeri-Turkish-BERT | 73.55% |
| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 | Accuracy |
|---|---|---|---|---|---|---|
| 1 | 0.2952 | 0.2657 | 0.7154 | 0.6229 | 0.6659 | 0.9191 |
| 2 | 0.2486 | 0.2521 | 0.7210 | 0.6380 | 0.6770 | 0.9214 |
| 3 | 0.2068 | 0.2534 | 0.7049 | 0.6507 | 0.6767 | 0.9209 |
| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 |
|---|---|---|---|---|---|
| 1 | 0.3231 | 0.2755 | 0.7758 | 0.6949 | 0.7331 |
| 3 | 0.2486 | 0.2525 | 0.7515 | 0.7412 | 0.7463 |
| 5 | 0.2238 | 0.2522 | 0.7644 | 0.7405 | 0.7522 |
| 7 | 0.2097 | 0.2507 | 0.7607 | 0.7394 | 0.7499 |
| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 |
|---|---|---|---|---|---|
| 1 | 0.4075 | 0.2538 | 0.7689 | 0.7214 | 0.7444 |
| 3 | 0.2144 | 0.2488 | 0.7509 | 0.7489 | 0.7499 |
| 6 | 0.1526 | 0.2881 | 0.7831 | 0.7284 | 0.7548 |
| 9 | 0.1194 | 0.3316 | 0.7393 | 0.7495 | 0.7444 |
| Epoch | Training Loss | Validation Loss | Precision | Recall | F1 |
|---|---|---|---|---|---|
| 1 | 0.4331 | 0.3067 | 0.7390 | 0.6933 | 0.7154 |
| 3 | 0.2506 | 0.2751 | 0.7583 | 0.7094 | 0.7330 |
| 6 | 0.1992 | 0.2861 | 0.7551 | 0.7170 | 0.7355 |
| 9 | 0.1717 | 0.3138 | 0.7431 | 0.7255 | 0.7342 |
graph LR
subgraph "Frontend"
A[HTML5] --> B[CSS3]
B --> C[JavaScript]
end
subgraph "Backend"
D[FastAPI] --> E[Python 3.8+]
E --> F[Uvicorn]
end
subgraph "ML Stack"
G[Transformers] --> H[PyTorch]
H --> I[Hugging Face]
end
subgraph "Deployment"
J[Docker] --> K[Fly.io]
K --> L[Production]
end
C --> D
F --> G
I --> J
git clone https://huggingface.co/IsmatS/Named_Entity_Recognition
cd Named_Entity_Recognition
# Create virtual environment
python -m venv .venv
# Activate virtual environment
# On Unix/macOS:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
uvicorn main:app --host 0.0.0.0 --port 8080
# On Unix/macOS
curl -L https://fly.io/install.sh | sh
# Login to Fly.io
fly auth login
# Initialize app
fly launch
# Configure memory (minimum 2GB recommended)
fly scale memory 2048
fly deploy
# Monitor deployment
fly logs
Access the application:
Enter Azerbaijani text in the input field
Click "Submit" to process and view named entities
View results with entities highlighted by category and confidence scores
# Example API request
import requests
response = requests.post(
"https://named-entity-recognition.fly.dev/predict/",
data={"text": "2014-cΓΌ ildΙ AzΙrbaycan RespublikasΔ±nΔ±n prezidenti Δ°lham Ζliyev Salyanda olub."}
)
print(response.json())
# Output: {
# "entities": {
# "Date": ["2014"],
# "Government": ["AzΙrbaycan"],
# "Organization": ["RespublikasΔ±nΔ±n"],
# "Position": ["prezidenti"],
# "Person": ["Δ°lham Ζliyev"],
# "Location": ["Salyanda"]
# }
# }
We welcome contributions! Here's how you can help:
git checkout -b feature/AmazingFeature)git commit -m 'Add some AmazingFeature')git push origin feature/AmazingFeature)This project is open source and available under the MIT License.