- VRM-Phase I VKW system description of long-short video customizable keyword wakeup challenge Keyword wakeup technology has always been a research hotspot in speech processing, but many related works were done on different datasets. We organized a Chinese long-short video keyword wakeup challenge (Video Keyword Wakeup Challenge, VKW) for testing the ability of each participating team to build a keyword wakeup system under the public dataset. All submitted systems not only need to support the setting of multiple different keywords, but also need to support the wakeup of any costumed keyword.This paper mainly describes the basic situation of the VKW challenge and the experimental results of some participating teams. 4 authors · Oct 17, 2021
- VRMDiff: Text-Guided Video Referring Matting Generation of Diffusion We propose a new task, video referring matting, which obtains the alpha matte of a specified instance by inputting a referring caption. We treat the dense prediction task of matting as video generation, leveraging the text-to-video alignment prior of video diffusion models to generate alpha mattes that are temporally coherent and closely related to the corresponding semantic instances. Moreover, we propose a new Latent-Constructive loss to further distinguish different instances, enabling more controllable interactive matting. Additionally, we introduce a large-scale video referring matting dataset with 10,000 videos. To the best of our knowledge, this is the first dataset that concurrently contains captions, videos, and instance-level alpha mattes. Extensive experiments demonstrate the effectiveness of our method. The dataset and code are available at https://github.com/Hansxsourse/VRMDiff. 7 authors · Mar 11, 2025