DialogueSidon

Two-speaker dialogue separation model based on diffusion with a VAE-32 latent space.

Components

File Description
ssl_encoder.pt2 w2v-BERT 2.0 backbone + latent projection heads
diffusion_head.pt2 DiffusionTransformerHead โ€” single denoising step
vae_decoder.pt2 DAC VAE decoder: latents โ†’ 24 kHz audio
metadata.json Latent normalisation stats, scheduler config, model dims

Contributors

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using sarulab-speech/DialogueSidon 1