Cahlen Humphreys PRO
cahlen
AI & ML interests
ā ļøš»
Recent Activity
liked a model about 20 hours ago
tencent/HY-World-2.0 reacted to imnotkitty's post with š about 20 hours ago
Just tried https://huggingface.co/tencent/HY-World-2.0 ā a multimodal world model that takes in text or a single image and generates editable 3D scenes.
Unlike Google's Genie and HY-World 1.5, v2.0 generates engine-ready 3D content:
š® Direct import into Unreal Engine and Unity ā no format wrangling
š§ Supports multiple 3D asset formats: Mesh, 3DGS, point cloud, etc.
āļø Fully editable ā not a baked video, but actual geometry you can modify
š¤ Also usable for embodied simulation environments
Basically: from "AI generates a world you can look at" ā "AI generates a world you can ship." reacted to imnotkitty's post with š about 20 hours ago
Just tried https://huggingface.co/tencent/HY-World-2.0 ā a multimodal world model that takes in text or a single image and generates editable 3D scenes.
Unlike Google's Genie and HY-World 1.5, v2.0 generates engine-ready 3D content:
š® Direct import into Unreal Engine and Unity ā no format wrangling
š§ Supports multiple 3D asset formats: Mesh, 3DGS, point cloud, etc.
āļø Fully editable ā not a baked video, but actual geometry you can modify
š¤ Also usable for embodied simulation environments
Basically: from "AI generates a world you can look at" ā "AI generates a world you can ship."Organizations
Animations
World Models
3D / Mesh
Gaussians and Nerfs
-
Sketch2NeRF: Multi-view Sketch-guided Text-to-3D Generation
Paper ⢠2401.14257 ⢠Published ⢠12 -
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
Paper ⢠2402.05054 ⢠Published ⢠29 -
MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction
Paper ⢠2402.12712 ⢠Published ⢠18 -
GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting
Paper ⢠2402.10259 ⢠Published ⢠15
Image Restoration
Surveys
TBR
Papers TO BE READ
-
3D-LLM: Injecting the 3D World into Large Language Models
Paper ⢠2307.12981 ⢠Published ⢠40 -
Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study
Paper ⢠2401.17981 ⢠Published ⢠1 -
SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
Paper ⢠2312.02126 ⢠Published ⢠2 -
Relightable Gaussian Codec Avatars
Paper ⢠2312.03704 ⢠Published ⢠32
Object Detection
Multimodal
DLM
Datasets
Audio
Web Agents
Data Generation
3D Avatar Utils
-
Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance
Paper ⢠2401.15687 ⢠Published ⢠24 -
Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians
Paper ⢠2312.03029 ⢠Published ⢠27 -
DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation
Paper ⢠2312.13578 ⢠Published ⢠29 -
Splatter Image: Ultra-Fast Single-View 3D Reconstruction
Paper ⢠2312.13150 ⢠Published ⢠15
Spatial
-
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper ⢠2402.05008 ⢠Published ⢠23 -
DepthFM: Fast Monocular Depth Estimation with Flow Matching
Paper ⢠2403.13788 ⢠Published ⢠18 -
Utonia: Toward One Encoder for All Point Clouds
Paper ⢠2603.03283 ⢠Published ⢠185
LLM
-
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
Paper ⢠2402.05140 ⢠Published ⢠23 -
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper ⢠2402.10193 ⢠Published ⢠21 -
QLoRA: Efficient Finetuning of Quantized LLMs
Paper ⢠2305.14314 ⢠Published ⢠61 -
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Paper ⢠2402.14658 ⢠Published ⢠84
Video
-
VideoPrism: A Foundational Visual Encoder for Video Understanding
Paper ⢠2402.13217 ⢠Published ⢠40 -
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding
Paper ⢠2405.08344 ⢠Published ⢠15 -
Helios: Real Real-Time Long Video Generation Model
Paper ⢠2603.04379 ⢠Published ⢠186
Agents
-
LLM Agent Operating System
Paper ⢠2403.16971 ⢠Published ⢠73 -
Real-Time Reasoning Agents in Evolving Environments
Paper ⢠2511.04898 ⢠Published ⢠13 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper ⢠2508.16279 ⢠Published ⢠61 -
Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?
Paper ⢠2604.03016 ⢠Published ⢠37
AI OS
DLM
Animations
Datasets
World Models
Audio
3D / Mesh
Web Agents
Gaussians and Nerfs
-
Sketch2NeRF: Multi-view Sketch-guided Text-to-3D Generation
Paper ⢠2401.14257 ⢠Published ⢠12 -
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
Paper ⢠2402.05054 ⢠Published ⢠29 -
MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction
Paper ⢠2402.12712 ⢠Published ⢠18 -
GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting
Paper ⢠2402.10259 ⢠Published ⢠15
Data Generation
Image Restoration
3D Avatar Utils
-
Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance
Paper ⢠2401.15687 ⢠Published ⢠24 -
Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians
Paper ⢠2312.03029 ⢠Published ⢠27 -
DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation
Paper ⢠2312.13578 ⢠Published ⢠29 -
Splatter Image: Ultra-Fast Single-View 3D Reconstruction
Paper ⢠2312.13150 ⢠Published ⢠15
Surveys
Spatial
-
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper ⢠2402.05008 ⢠Published ⢠23 -
DepthFM: Fast Monocular Depth Estimation with Flow Matching
Paper ⢠2403.13788 ⢠Published ⢠18 -
Utonia: Toward One Encoder for All Point Clouds
Paper ⢠2603.03283 ⢠Published ⢠185
TBR
Papers TO BE READ
-
3D-LLM: Injecting the 3D World into Large Language Models
Paper ⢠2307.12981 ⢠Published ⢠40 -
Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study
Paper ⢠2401.17981 ⢠Published ⢠1 -
SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
Paper ⢠2312.02126 ⢠Published ⢠2 -
Relightable Gaussian Codec Avatars
Paper ⢠2312.03704 ⢠Published ⢠32
LLM
-
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
Paper ⢠2402.05140 ⢠Published ⢠23 -
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper ⢠2402.10193 ⢠Published ⢠21 -
QLoRA: Efficient Finetuning of Quantized LLMs
Paper ⢠2305.14314 ⢠Published ⢠61 -
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Paper ⢠2402.14658 ⢠Published ⢠84
Object Detection
Video
-
VideoPrism: A Foundational Visual Encoder for Video Understanding
Paper ⢠2402.13217 ⢠Published ⢠40 -
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding
Paper ⢠2405.08344 ⢠Published ⢠15 -
Helios: Real Real-Time Long Video Generation Model
Paper ⢠2603.04379 ⢠Published ⢠186
Multimodal
Agents
-
LLM Agent Operating System
Paper ⢠2403.16971 ⢠Published ⢠73 -
Real-Time Reasoning Agents in Evolving Environments
Paper ⢠2511.04898 ⢠Published ⢠13 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper ⢠2508.16279 ⢠Published ⢠61 -
Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?
Paper ⢠2604.03016 ⢠Published ⢠37