new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Mar 10

VLA-Arena: An Open-Source Framework for Benchmarking Vision-Language-Action Models

While Vision-Language-Action models (VLAs) are rapidly advancing towards generalist robot policies, it remains difficult to quantitatively understand their limits and failure modes. To address this, we introduce a comprehensive benchmark called VLA-Arena. We propose a novel structured task design framework to quantify difficulty across three orthogonal axes: (1) Task Structure, (2) Language Command, and (3) Visual Observation. This allows us to systematically design tasks with fine-grained difficulty levels, enabling a precise measurement of model capability frontiers. For Task Structure, VLA-Arena's 170 tasks are grouped into four dimensions: Safety, Distractor, Extrapolation, and Long Horizon. Each task is designed with three difficulty levels (L0-L2), with fine-tuning performed exclusively on L0 to assess general capability. Orthogonal to this, language (W0-W4) and visual (V0-V4) perturbations can be applied to any task to enable a decoupled analysis of robustness. Our extensive evaluation of state-of-the-art VLAs reveals several critical limitations, including a strong tendency toward memorization over generalization, asymmetric robustness, a lack of consideration for safety constraints, and an inability to compose learned skills for long-horizon tasks. To foster research addressing these challenges and ensure reproducibility, we provide the complete VLA-Arena framework, including an end-to-end toolchain from task definition to automated evaluation and the VLA-Arena-S/M/L datasets for fine-tuning. Our benchmark, data, models, and leaderboard are available at https://vla-arena.github.io.

  • 9 authors
·
Dec 27, 2025

A Fully Automated DM-BIM-BEM Pipeline Enabling Graph-Based Intelligence, Interoperability, and Performance-Driven Early Design

Artificial intelligence in construction increasingly depends on structured representations such as Building Information Models and knowledge graphs, yet early-stage building designs are predominantly created as flexible boundary-representation (B-rep) models that lack explicit spatial, semantic, and performance structure. This paper presents a robust, fully automated framework that transforms unstructured B-rep geometry into knowledge-graph-based Building Information Models and further into executable Building Energy Models. The framework enables artificial intelligence to explicitly interpret building elements, spatial topology, and their associated thermal and performance attributes. It integrates automated geometry cleansing, multiple auto space-generation strategies, graph-based extraction of space and element topology, ontology-aligned knowledge modeling, and reversible transformation between ontology-based BIM and EnergyPlus energy models. Validation on parametric, sketch-based, and real-world building datasets demonstrates high robustness, consistent topological reconstruction, and reliable performance-model generation. By bridging design models, BIM, and BEM, the framework provides an AI-oriented infrastructure that extends BIM- and graph-based intelligence pipelines to flexible early-stage design geometry, enabling performance-driven design exploration and optimization by learning-based methods.

  • 6 authors
·
Jan 23