MOSS-TTS-PNY

This repository contains a speaker-conditioned MOSS-TTS Local checkpoint, a local copy of the MOSS audio tokenizer weights needed for codec feature reconstruction, exported iSTFTNet3 decoder artifacts, and a portable optimized runner. It was finetuned on several speakers from the My Little Pony: Friendship is Magic franchise, and Team Fortress 2.

The end-to-end path is:

text prompt -> MOSS-TTS local transformer -> RVQ audio codes
RVQ audio codes -> MOSS audio tokenizer quantizer + decoder[0:5]
decoder[4] 50 Hz features -> iSTFTNet3 vocoder -> 48 kHz waveform

The optimized runner is intended for CUDA inference and keeps the full 32 VQ channels. It uses PyTorch for MOSS-TTS generation and decoder4 feature extraction, then ONNX Runtime CUDA for the vocoder.

moss_tts_local_clipper_checkpoint/
  Fine-tuned MOSS-TTS Local checkpoint, tokenizer, config, and custom HF code.

moss_audio_tokenizer/
  Local OpenMOSS-Team/MOSS-Audio-Tokenizer copy. The optimized decoder4 path
  uses its quantizer and decoder[0:5] modules.

istftnet2_decoder4_50hz/
  Exported iSTFTNet2 vocoder artifacts:
  - istftnet2_decoder.onnx
  - istftnet2_decoder_cuda.ts
  - istftnet2_decoder_cpu.ts

moss_tts_torchopt_runner_bundle/
  Optimized CLI runner, Gradio demo, runtime helpers, speaker maps, and pinned
  non-Torch requirements.

run_tts_istftnet2.py
  Baseline end-to-end script using the checkpoint processor path.

run_decoder4_features.py
  Decoder-only sanity script for saved decoder[4] features shaped [frames, 768].

Runtime Versions

The runner was validated with:

torch 2.11.0+cu128
transformers 4.55.0
onnxruntime-gpu 1.26.0
gradio 5.49.1

transformers==4.55.0 is pinned. Newer versions of transformers might work, but output gibberish.

PyTorch is not included in moss_tts_torchopt_runner_bundle/requirements.txt. Install a CUDA PyTorch build that matches your GPU architecture first, then install the non-Torch requirements.

Installation

From a fresh environment with CUDA PyTorch already installed:

cd moss_tts_clipper_istftnet2_release
python -m pip install -r moss_tts_torchopt_runner_bundle/requirements.txt

Or just ask your favorite coding agent to install it for you.

Optimized CLI

We have an optimized runner that uses cudagraphs and other fancy stuff to speed up PyTorch generation to 1.8x realtime. Run from the repository root:

python moss_tts_torchopt_runner_bundle/run_tts_torchopt.py \
  --text "The custom decoder is running clearly now." \
  --speaker-id 31 \
  --warmup 1 \
  --repeat 1

Useful options:

# Disable generation monkeypatch for debugging.
python moss_tts_torchopt_runner_bundle/run_tts_torchopt.py \
  --torch-opt-mode none

# Use CPU ONNX vocoder if CUDA ORT is unavailable.
python moss_tts_torchopt_runner_bundle/run_tts_torchopt.py \
  --decoder-runtime onnx_cpu

# Use explicit paths if the runner is outside this repo layout.
python moss_tts_torchopt_runner_bundle/run_tts_torchopt.py \
  --checkpoint /path/to/moss_tts_local_clipper_checkpoint \
  --codec-path /path/to/moss_audio_tokenizer \
  --decoder-dir /path/to/istftnet2_decoder4_50hz

Gradio Demo

Run:

python moss_tts_torchopt_runner_bundle/optimized_gradio_demo.py \
  --host 0.0.0.0 \
  --port 7860 \
  --share

Limitations

This is a beta. Improved checkpoints will be gradually released.
This repository is an inference bundle, not a training recipe.
The model uses custom Python code. Load with trust_remote_code=True when using Transformers APIs directly.
The exact licensing and redistribution terms for upstream MOSS-TTS and MOSS audio tokenizer components should be checked before public redistribution or commercial use.
Some speakers are less represented in the dataset than others and thus might exhibit lower performance.
You probably can't use this for commercial purposes without Hasbro's lawyers objecting.

Contact

For any inquiries, e-mail nika109021@gmail.com.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

ZDisket
/

MOSS-TTS-PNY