Instructions to use onnx-community/chatterbox-multilingual-ONNX with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Chatterbox
How to use onnx-community/chatterbox-multilingual-ONNX with Chatterbox:
# pip install chatterbox-tts import torchaudio as ta from chatterbox.tts import ChatterboxTTS model = ChatterboxTTS.from_pretrained(device="cuda") text = "Ezreal and Jinx teamed up with Ahri, Yasuo, and Teemo to take down the enemy's Nexus in an epic late-game pentakill." wav = model.generate(text) ta.save("test-1.wav", wav, model.sr) # If you want to synthesize with a different voice, specify the audio prompt AUDIO_PROMPT_PATH="YOUR_FILE.wav" wav = model.generate(text, audio_prompt_path=AUDIO_PROMPT_PATH) ta.save("test-2.wav", wav, model.sr) - Notebooks
- Google Colab
- Kaggle
Problem with Enlgish Speaking
When trying to use this multilingual-onnx scripts to do tts task for English text, the result is not good. I remember the original torch multillingual version is good for both English and other language.
Is the languagemodel.onnx correct? Kindly pelase share the scripts for converting lalnguage_model.onnx.
Thank you~
Hi @dove88 ! Thank you for reported this issue. It seems the issue was in tokenizer.config that was introduced by accident during replacement. I uploaded a new one, so you could try again
@vladislavbro , thank you very much, multilingual model also working for english now!
How about the speed from your side? The float16 lang_model version has performance of Time-To-First-Token around 4.5s with streaming_size = 30 on A100 GPU card, seems not good enough for real-time setttings.
I dont know if it is my problem?
I aslo tried to convert the onnx to tensorRT, but failed due to dynamic shape and customized operators.
Hmm good question, I did not measure performance for these models tbh, probably @Xenova did some tests