Text-to-Speech
CosyVoice
singapore-mandarin

CosyVoice3 SG

This repository contains the llm.pt checkpoint fine-tuned on the Singapore Mandarin subset, as presented in the paper Joycent: Diffusion-based Accent TTS without Accented Phone Prediction.

The remaining CosyVoice3 components are loaded from FunAudioLLM/Fun-CosyVoice3-0.5B-2512. The checkpoint is intended to replace the base model's llm.pt; it does not include flow.pt, hift.pt, tokenizer, or ONNX files.

Project Resources

Inference

The inference wrapper for this model is available in the Joycent project as joycent/inference_cosyvoice.py.

Citation

If you find this work useful, please cite:

@misc{wang2026joycentdiffusionbasedaccenttts,
      title={Joycent: Diffusion-based Accent TTS without Accented Phone Prediction},
      author={Xintong Wang and Ye Wang},
      year={2026},
      eprint={2606.16417},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
}
Downloads last month
35
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for walston/cosyvoice3-sg

Finetuned
(12)
this model

Space using walston/cosyvoice3-sg 1

Paper for walston/cosyvoice3-sg