DialogueSidon
Two-speaker dialogue separation model based on diffusion with a VAE-32 latent space.
Components
| File | Description |
|---|---|
ssl_encoder.pt2 |
w2v-BERT 2.0 backbone + latent projection heads |
diffusion_head.pt2 |
DiffusionTransformerHead โ single denoising step |
vae_decoder.pt2 |
DAC VAE decoder: latents โ 24 kHz audio |
metadata.json |
Latent normalisation stats, scheduler config, model dims |
Contributors
- Wataru Nakata
- Yuki Saito
- Kazuki Yamauchi
- Emiru Tsunoo
- Hiroshi Saruwatari
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support