Codec Training Code
Hello!
Please release fine-tuning and training from scratch your Qwen Codec (audio tokenizer). No WavLM or anything, pure PyTorch audio-in audio-out loss. I want to train from scratch on my 48 kHz LJSpeech-sized dataset!
Thanks!
+1
seconding this. I really want to learn how to train this tokenizer from scratch on my own acoustic datasets.
Updates?
Please look into the script. Thereโs no gradients and loss calculation for encoder, decoder
I was also waiting for the official training code, but since it hasn't been released no matter how long I wait, I made my own. Feel free to use it if you like. This is a script that performs fine-tuning only on the decoder (so codec compatibility is preserved). It also comes with an added bonus: you can tweak the upsampler settings for 48kHz output.
https://github.com/takuma104/Qwen3-TTS-Tokenizer-12Hz-Trainer