DialogueSidon

Two-speaker dialogue separation model based on diffusion with a VAE-32 latent space.

Components

File	Description
`ssl_encoder.pt2`	w2v-BERT 2.0 backbone + latent projection heads
`diffusion_head.pt2`	DiffusionTransformerHead — single denoising step
`vae_decoder.pt2`	DAC VAE decoder: latents → 24 kHz audio
`metadata.json`	Latent normalisation stats, scheduler config, model dims

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support