Joycent ParallelWaveGAN Vocoder

This repository stores the ParallelWaveGAN vocoder used by Joycent Mandarin accent text-to-speech inference, as presented in the paper Joycent: Diffusion-based Accent TTS without Accented Phone Prediction.

The model generates 16 kHz audio from 80-bin mel spectrograms.

Usage

Keep checkpoint-50000steps.pkl and config.yml in the same directory when loading the model with ParallelWaveGAN:

import yaml
from parallel_wavegan.utils import load_model

with open("config.yml", encoding="utf-8") as file:
    config = yaml.load(file, Loader=yaml.Loader)

vocoder = load_model("checkpoint-50000steps.pkl", config)
vocoder.remove_weight_norm()
vocoder.eval()

The Joycent implementation and inference instructions are available in the official repository.

Citation

@misc{wang2026joycentdiffusionbasedaccenttts,
      title={Joycent: Diffusion-based Accent TTS without Accented Phone Prediction},
      author={Xintong Wang and Ye Wang},
      year={2026},
      eprint={2606.16417},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using walston/joycent-vocoder 1

Paper for walston/joycent-vocoder