Xobdo Boroxa 1.0 — Assamese Neural TTS Model

Xobdo Boroxa 1.0 is an Assamese neural Text-to-Speech (TTS) model developed as a collaborative project between Assam AI Initiative (AAII-আই) and Xobdo.org.

This model has been trained using around 4 hours of high-quality Assamese speech data and is built using an ESPnet-based FastSpeech2 + Multi-band MelGAN TTS pipeline.

Model Overview

The TTS system is based on:

Acoustic model: FastSpeech2
Vocoder: Multi-band MelGAN
Toolkit: ESPnet
Language: Assamese
Training data: Around 4 hours of high-quality Assamese speech
Input: Assamese Unicode text
Output: Assamese speech waveform

The goal of this model is to provide a lightweight and usable Assamese neural TTS system for browser-based and community-focused language technology applications.

Applications Built Using This Model

Using this model, we have developed two free public-facing Assamese reading tools.

1. Assamese Text-to-Speech Web App

A browser-based Assamese TTS web application that can read Assamese Unicode text directly from webpages or pasted text.

Key features:

Reads Assamese Unicode text
Supports webpage URL input
Supports direct text input
Runs speech synthesis inside the browser
No server-side API call required for synthesis
Model is cached in the browser after first load
Can work offline after the model is loaded
Useful for accessibility, reading support, education, and Assamese digital content consumption

Project page: https://www.xobdo.org/project_pathak/

2. Assamese TTS Chrome Extension

A Chrome extension for reading Assamese text directly from webpages.

Key features:

Select Assamese text on any webpage and listen to it
Right-click and choose full-page reading
Runs locally inside the browser
No API call required for speech synthesis
Model is downloaded once and cached locally
Works offline after the first model load
Designed for desktop Chrome users

Links:

Acknowledgement

This model and the related applications are part of a collaborative effort by Assam AI Initiative (AAII-আই) and Xobdo.org to support Assamese language technology and make Assamese digital content more accessible.

Disclaimer

This is an early version of the Assamese TTS model. The model is designed to generate clear and natural Assamese speech, but pronunciation, prosody, and expressiveness may still improve with more training data and future model updates.

Downloads last month: 78