Joycent ParallelWaveGAN Vocoder

This repository stores the ParallelWaveGAN vocoder used by Joycent Mandarin accent text-to-speech inference, as presented in the paper Joycent: Diffusion-based Accent TTS without Accented Phone Prediction.

The model generates 16 kHz audio from 80-bin mel spectrograms.

Official Code: oshindow/Joycent-code
Project Page: Demos

Usage

Keep checkpoint-50000steps.pkl and config.yml in the same directory when loading the model with ParallelWaveGAN:

import yaml
from parallel_wavegan.utils import load_model

with open("config.yml", encoding="utf-8") as file:
    config = yaml.load(file, Loader=yaml.Loader)

vocoder = load_model("checkpoint-50000steps.pkl", config)
vocoder.remove_weight_norm()
vocoder.eval()

The Joycent implementation and inference instructions are available in the official repository.

Citation

@misc{wang2026joycentdiffusionbasedaccenttts,
      title={Joycent: Diffusion-based Accent TTS without Accented Phone Prediction},
      author={Xintong Wang and Ye Wang},
      year={2026},
      eprint={2606.16417},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Space using walston/joycent-vocoder 1

Paper for walston/joycent-vocoder

Joycent: Diffusion-based Accent TTS without Accented Phone Prediction

Paper • 2606.16417 • Published 4 days ago