Rakancorle1 's Collections

When Vision Speaks for Sound

Data and model for the When Vision Speaks for Sound. Includes SFT and DPO training data, evaluation data and trained checkpoints.