Instructions to use anuran-roy/pratilekha-v0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use anuran-roy/pratilekha-v0 with PEFT:
Task type is invalid.
- Transformers
How to use anuran-roy/pratilekha-v0 with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("anuran-roy/pratilekha-v0", dtype="auto") - Notebooks
- Google Colab
- Kaggle

Pratilekha by Alchemyst AI - Lightweight Indic STT Model
Indic STT model based on OpenAI Whisper Tiny model using LoRA adapters for multilingual Indic speech-to-text transcription with automatic language detection. Trained on noisy, real-world audio data, Pratilekha handles diverse acoustic conditions, seamless language/accent switching, and achieves ~20% better accuracy than OpenAI's Whisper-1 on Indic language benchmarks.
Model Details
Model Description
This model is a LoRA-adapted version of OpenAI's Whisper Tiny, fine-tuned on noisy audio data for robust transcription of speech in multiple Indic languages. It excels at effortless switching between languages and accents within and across utterances, making it well-suited for code-mixed and multi-accent Indic speech. Despite its lightweight architecture, Pratilekha delivers approximately 20% improved transcription accuracy over OpenAI's Whisper-1 on Indic language tasks. It supports automatic language detection and can be served via a WebSocket-based FastAPI server for real-time transcription.
- Developed by: Anuran Roy
- Model type: Seq2Seq Speech-to-Text (LoRA fine-tuned)
- Language(s) (NLP): Hindi (
hi), English (en), Bengali (bn), Tamil (ta), Telugu (te), Marathi (mr), Gujarati (gu), Kannada (kn), Malayalam (ml), Punjabi (pa), Urdu (ur), Odia (or) - License: CC-BY-SA-4.0
- Finetuned from model: openai/whisper-tiny
Uses
Direct Use
The model can be used directly for speech-to-text transcription of Indic language audio files. It supports both explicit language specification and automatic language detection. Inputs are taken at 16 kHz frequency (pcm16)
Downstream Use
The model can be integrated into voice agent pipelines, real-time transcription services, or any application requiring Indic language speech recognition. A WebSocket-based FastAPI server is provided for real-time inference.
Out-of-Scope Use
- Languages not listed in the supported languages above.
- Noisy or very low-quality audio recordings may produce poor results.
- Not intended for speaker identification or diarization.
How to Get Started with the Model
Python Inference
import torch
from transformers import WhisperProcessor, WhisperForConditionalGeneration
from peft import PeftModel
import librosa
from typing import Optional
def load_finetuned_model(
adapter_path: str = "whisper-indic-voice-agent-tiny",
base_model: str = "openai/whisper-tiny"
):
"""Load the fine-tuned Whisper model with LoRA adapters"""
processor = WhisperProcessor.from_pretrained(adapter_path)
model = WhisperForConditionalGeneration.from_pretrained(
base_model,
torch_dtype=torch.float16,
device_map="auto"
)
model = PeftModel.from_pretrained(model, adapter_path)
model.eval()
return model, processor
# Load model
model, processor = load_finetuned_model()
# Load audio
audio, sr = librosa.load("path/to/audio.wav", sr=16000)
# Process and transcribe
input_features = processor(
audio, sampling_rate=16000, return_tensors="pt"
).input_features.to(model.device, dtype=torch.float16)
with torch.no_grad():
predicted_ids = model.generate(
input_features,
max_length=448,
language="hi", # Set to None for auto-detection
task="transcribe"
)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(f"Transcription: {transcription}")
WebSocket Server
A FastAPI-based WebSocket server is included for real-time transcription:
python server.py
The server exposes:
GET /health— Health check endpoint.WS /ws/transcribe— WebSocket endpoint for streaming transcription.
WebSocket Message Format
Request:
{
"audio": {
"data": "<base64-encoded-audio>",
"encoding": "audio/wav",
"sample_rate": "16000"
}
}
Response (events + data):
{"type": "events", "data": {"signal_type": "START_SPEECH", "occured_at": 1234567890.0, "session_id": "..."}}
{"type": "events", "data": {"signal_type": "END_SPEECH", "occured_at": 1234567890.0, "session_id": "..."}}
{
"type": "data",
"data": {
"request_id": "...",
"transcript": "transcribed text",
"language_code": "hi-IN",
"metrics": {
"audio_duration": 3.5,
"processing_latency": 0.42
}
}
}
Training Details
Training Procedure
Training is executed via the provided shell script:
bash run_training.sh
This script:
- Creates and activates a Python virtual environment.
- Installs dependencies from
requirements.txt. - Runs
train_file.pywith GPU support (CUDA_VISIBLE_DEVICES=0). - Saves the fine-tuned model to
./whisper-indic-voice-agent-tiny.
Training Hyperparameters
- Training regime: fp16 mixed precision
- Adapter method: LoRA (via PEFT)
Bias, Risks, and Limitations
- Performance may vary across supported languages depending on training data distribution.
- Auto language detection may be inaccurate for short audio clips or code-mixed speech.
- The Whisper Tiny base model has limited capacity; expect lower accuracy compared to larger Whisper variants.
Recommendations
Users should validate transcription quality on their target language and domain before deploying in production. Consider using a larger base model (e.g., whisper-small or whisper-medium) for improved accuracy.
Technical Specifications
Model Architecture and Objective
- Base Architecture: OpenAI Whisper Tiny (encoder-decoder transformer)
- Fine-tuning Method: LoRA (Low-Rank Adaptation) via PEFT
- Objective: Seq2Seq speech transcription
Compute Infrastructure
Hardware
- GPU: CUDA-compatible GPU (single GPU training via
CUDA_VISIBLE_DEVICES=0). We used 1x L4 GPU to train this over 3 GPU hours.
Software
- Python 3.x
- PyTorch with CUDA
- Hugging Face Transformers
- PEFT 0.17.1
- librosa, soundfile
- FastAPI + Uvicorn (for serving)
Model Card Authors
Framework versions
- PEFT 0.17.1
- Downloads last month
- -
Model tree for anuran-roy/pratilekha-v0
Base model
openai/whisper-tiny