YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
MOSS-TTS-PNY
This repository contains a speaker-conditioned MOSS-TTS Local checkpoint, a local copy of the MOSS audio tokenizer weights needed for codec feature reconstruction, exported iSTFTNet3 decoder artifacts, and a portable optimized runner. It was finetuned on several speakers from the My Little Pony: Friendship is Magic franchise, and Team Fortress 2.
The end-to-end path is:
text prompt -> MOSS-TTS local transformer -> RVQ audio codes
RVQ audio codes -> MOSS audio tokenizer quantizer + decoder[0:5]
decoder[4] 50 Hz features -> iSTFTNet3 vocoder -> 48 kHz waveform
The optimized runner is intended for CUDA inference and keeps the full 32 VQ channels. It uses PyTorch for MOSS-TTS generation and decoder4 feature extraction, then ONNX Runtime CUDA for the vocoder.
Contents
moss_tts_local_clipper_checkpoint/
Fine-tuned MOSS-TTS Local checkpoint, tokenizer, config, and custom HF code.
moss_audio_tokenizer/
Local OpenMOSS-Team/MOSS-Audio-Tokenizer copy. The optimized decoder4 path
uses its quantizer and decoder[0:5] modules.
istftnet2_decoder4_50hz/
Exported iSTFTNet2 vocoder artifacts:
- istftnet2_decoder.onnx
- istftnet2_decoder_cuda.ts
- istftnet2_decoder_cpu.ts
moss_tts_torchopt_runner_bundle/
Optimized CLI runner, Gradio demo, runtime helpers, speaker maps, and pinned
non-Torch requirements.
run_tts_istftnet2.py
Baseline end-to-end script using the checkpoint processor path.
run_decoder4_features.py
Decoder-only sanity script for saved decoder[4] features shaped [frames, 768].
Runtime Versions
The runner was validated with:
torch 2.11.0+cu128
transformers 4.55.0
onnxruntime-gpu 1.26.0
gradio 5.49.1
transformers==4.55.0 is pinned. Newer versions of transformers might work, but output gibberish.
PyTorch is not included in moss_tts_torchopt_runner_bundle/requirements.txt.
Install a CUDA PyTorch build that matches your GPU architecture first, then
install the non-Torch requirements.
Installation
From a fresh environment with CUDA PyTorch already installed:
cd moss_tts_clipper_istftnet2_release
python -m pip install -r moss_tts_torchopt_runner_bundle/requirements.txt
Or just ask your favorite coding agent to install it for you.
Optimized CLI
We have an optimized runner that uses cudagraphs and other fancy stuff to speed up PyTorch generation to 1.8x realtime. Run from the repository root:
python moss_tts_torchopt_runner_bundle/run_tts_torchopt.py \
--text "The custom decoder is running clearly now." \
--speaker-id 31 \
--warmup 1 \
--repeat 1
Useful options:
# Disable generation monkeypatch for debugging.
python moss_tts_torchopt_runner_bundle/run_tts_torchopt.py \
--torch-opt-mode none
# Use CPU ONNX vocoder if CUDA ORT is unavailable.
python moss_tts_torchopt_runner_bundle/run_tts_torchopt.py \
--decoder-runtime onnx_cpu
# Use explicit paths if the runner is outside this repo layout.
python moss_tts_torchopt_runner_bundle/run_tts_torchopt.py \
--checkpoint /path/to/moss_tts_local_clipper_checkpoint \
--codec-path /path/to/moss_audio_tokenizer \
--decoder-dir /path/to/istftnet2_decoder4_50hz
Gradio Demo
Run:
python moss_tts_torchopt_runner_bundle/optimized_gradio_demo.py \
--host 0.0.0.0 \
--port 7860 \
--share
Limitations
- This is a beta. Improved checkpoints will be gradually released.
- This repository is an inference bundle, not a training recipe.
- The model uses custom Python code. Load with
trust_remote_code=Truewhen using Transformers APIs directly. - The exact licensing and redistribution terms for upstream MOSS-TTS and MOSS audio tokenizer components should be checked before public redistribution or commercial use.
- Some speakers are less represented in the dataset than others and thus might exhibit lower performance.
- You probably can't use this for commercial purposes without Hasbro's lawyers objecting.
Contact
For any inquiries, e-mail nika109021@gmail.com.