19 5 1

Deepti Ghadiyaram PRO

dghadiya

AI & ML interests

None yet

Recent Activity

liked a dataset 11 days ago

dghadiya/TAG-Bench-Video

updated a dataset 27 days ago

dghadiya/TAG-Bench-Video

upvoted a paper about 2 months ago

Semantic Richness or Geometric Reasoning? The Fragility of VLM's Visual Invariance

View all activity

Organizations

liked a dataset 11 days ago

dghadiya/TAG-Bench-Video

Viewer • Updated 27 days ago • 300 • 951 • 1

updated a dataset 27 days ago

dghadiya/TAG-Bench-Video

Viewer • Updated 27 days ago • 300 • 951 • 1

upvoted a paper about 2 months ago

Semantic Richness or Geometric Reasoning? The Fragility of VLM's Visual Invariance

Paper • 2604.01848 • Published Apr 3 • 5

submitted a paper to Daily Papers about 2 months ago

Semantic Richness or Geometric Reasoning? The Fragility of VLM's Visual Invariance

Paper • 2604.01848 • Published Apr 3 • 5

authored 12 papers about 2 months ago

$\textit{Revelio}$: Interpreting and leveraging semantic information in diffusion models

Paper • 2411.16725 • Published Nov 23, 2024 • 1

Improving Physical Object State Representation in Text-to-Image Generative Systems

Paper • 2505.02236 • Published May 4, 2025

Right Side Up? Disentangling Orientation Understanding in MLLMs with Fine-grained Multi-axis Perception Tasks

Paper • 2505.21649 • Published May 27, 2025 • 3

Progressive Prompt Detailing for Improved Alignment in Text-to-Image Generative Models

Paper • 2503.17794 • Published Mar 22, 2025

Some Modalities are More Equal Than Others: Decoding and Architecting Multimodal Integration in MLLMs

Paper • 2511.22826 • Published Nov 28, 2025 • 8

Generative Action Tell-Tales: Assessing Human Motion in Synthesized Videos

Paper • 2512.01803 • Published Dec 1, 2025 • 5

DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformers

Paper • 2602.16968 • Published Feb 19 • 12

Semantic Richness or Geometric Reasoning? The Fragility of VLM's Visual Invariance

Paper • 2604.01848 • Published Apr 3 • 5

A Systematic Study of Cross-Modal Typographic Attacks on Audio-Visual Reasoning

Paper • 2604.03995 • Published Apr 5 • 4

submitted a paper to Daily Papers about 2 months ago

A Systematic Study of Cross-Modal Typographic Attacks on Audio-Visual Reasoning

Paper • 2604.03995 • Published Apr 5 • 4

upvoted a paper 2 months ago

CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents

Paper • 2603.24440 • Published Mar 25 • 98

updated a Space 3 months ago

Videoeval Humaneval

👁

Rate AI‑generated videos on prompt match and motion quality

upvoted a paper 3 months ago

DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformers

Paper • 2602.16968 • Published Feb 19 • 12

Deepti Ghadiyaram PRO

AI & ML interests

Recent Activity

Organizations

dghadiya's activity

Videoeval Humaneval