Hierarchical Codec Diffusion for Video-to-Speech Generation Paper • 2604.15923 • Published 5 days ago • 2
FaceLLM Collection A multimodal large language model trained specifically for facial image understanding. Project page: https://www.idiap.ch/paper/facellm • 3 items • Updated Jul 23, 2025 • 4