Papers
arxiv:2509.02113

HiGraph: A Large-Scale Hierarchical Graph Dataset for Malware Analysis

Published on Sep 2, 2025
Authors:
,
,
,
,
,

Abstract

A large-scale hierarchical graph dataset for malware analysis is introduced, featuring nested control flow and function call graphs that preserve structural semantics for robust detection against code obfuscation and malware evolution.

AI-generated summary

The advancement of graph-based malware analysis is critically limited by the absence of large-scale datasets that capture the inherent hierarchical structure of software. Existing methods often oversimplify programs into single level graphs, failing to model the crucial semantic relationship between high-level functional interactions and low-level instruction logic. To bridge this gap, we introduce \dataset, the largest public hierarchical graph dataset for malware analysis, comprising over 200M Control Flow Graphs (CFGs) nested within 595K Function Call Graphs (FCGs). This two-level representation preserves structural semantics essential for building robust detectors resilient to code obfuscation and malware evolution. We demonstrate HiGraph's utility through a large-scale analysis that reveals distinct structural properties of benign and malicious software, establishing it as a foundational benchmark for the community. The dataset and tools are publicly available at https://higraph.org.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.02113 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.02113 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.