AI & ML interests

None defined yet.

Recent Activity

pku-pcniΒ  updated a Space 5 days ago
pku-pcni-lab/README
pku-pcniΒ  updated a dataset about 1 month ago
pku-pcni-lab/Multi-modal_dataset_named_SynthSoM
View all activity

Organization Card

πŸ“’ News

🌐 Open Dataset Platform Now Accepting Community Contributions
Dataset upload, review, and open-release workflow is now available.

We are pleased to announce the launch of the PCNI-PKU Open Dataset Platform, an open-source full-band multi-modal dataset platform for integrated sensing, communication, and intelligence research. The platform covers multiple data modalities across frequency bands from Sub-6 GHz to visible light, including RGB images, Depth maps, LiDAR point clouds, mmWave radar data, CSI, and pathloss map. It supports a wide range of scenarios such as vehicle-to-infrastructure cooperation (V2I), low-altitude economy, and smart campus applications.

As of May 2026, the platform has accumulated more than 13 million data samples and released 10 open-source datasets, including 3 RF communication channel datasets and 7 multimodal integrated sensing-and-communication datasets. Since April 2026, the datasets have received over 5,890 downloads on Hugging Face. We will continue to expand the platform, with SynthSoM-Twin expected to be released in the coming months.

To address the high cost of collecting large-scale sensing-communication data, we are building a community-driven open-source ecosystem for dataset co-construction. The platform introduces a contribution and incentive mechanism to encourage continuous data sharing, support sustainable dataset expansion, and promote the development of integrated sensing, communication, and intelligence research.

Contributors will be acknowledged on the Hugging Face platform and may receive free promotional support. A contribution-level system will also be introduced, where different levels of contribution are matched with phased data-access privileges to encourage long-term participation.

Dataset Submission Guidelines

If you have multimodal data involving communication and sensing, collected through real-world measurements or simulation platforms, and would like to join our open-source data community, you are welcome to contribute your dataset following the procedure below.

1. Data Preparation

To ensure consistency and usability, please organize your dataset package using the following recommended structure:

dataset_package.zip
β”œβ”€ metadata/contributor.json
β”œβ”€ README.md
β”œβ”€ DatasetCard.md
β”œβ”€ license.txt
β”œβ”€ config/acquisition.json
β”œβ”€ config/sensor_setup.yaml
β”œβ”€ config/sync_schema.json
β”œβ”€ rf/xxx
β”œβ”€ image/rgb/xxx
β”œβ”€ image/depth/xxx
β”œβ”€ point_cloud/lidar/xxx
β”œβ”€ point_cloud/mmwave_radar/xxx
β”œβ”€ pose/xxx
β”œβ”€ calibration/camera_intrinsics.json
β”œβ”€ calibration/lidar_to_camera.json
β”œβ”€ calibration/mmwave_radar_to_lidar.json
└─ simulation/config.json

The metadata/contributor.json, README.md, and DatasetCard.md files are required. An example of metadata/contributor.json is shown below:

{
  "dataset_name": "RF Multimodal Sample",
  "contributor_name": "Alice Lab",
  "organization": "PKU",
  "contact": "alice@example.com",
  "hf_account": "alice-hf"
}

The rf/ folder may include files such as csi_subcarriers.csv, cir_taps.csv, rss.csv, and pathloss.csv. RGB images may be stored as frame_0001.png or frame_0001.jpg; depth maps as depth_0001.png; LiDAR point clouds as lidar_0001.pcd or lidar_0002.ply; mmWave radar point clouds as radar_0001.ply; and agent trajectories as trajectory.csv.

2. Preliminary Review

To verify data usability, please first select 10 to 100 sample groups, organize them according to the structure above, and upload the compressed package to:
PKU Disk Upload Link
Extraction code: pcni

Please name the compressed package as {your_huggingface_account}.zip.

3. Full Dataset Upload

After the preliminary review is completed, we will grant you upload access to the corresponding Hugging Face repository, where you can submit the full dataset for open release.


🧬 SynthSoM-Twin Coming Soon
A digital-twin dataset for Synesthesia of Machines will be released soon.

We are excited to announce that SynthSoM-Twin, our upcoming digital-twin dataset for Synesthesia of Machines, is expected to be open-sourced in the coming months. Built upon the construction architecture of the SynthSoM dataset, SynthSoM-Twin further reconstructs real-world scenarios from DeepSense 6G in digital-twin environments, enabling spatio-temporal consistency between the physical world and the simulation environment.

SynthSoM-Twin covers two representative vehicle-to-infrastructure scenarios, including urban and suburban environments. Its data acquisition and construction process considers multiple frequency bands, sensing perspectives, and vehicle traffic densities. The digital-twin reconstruction includes both static scenes and dynamic objects: static environments are restored using real-world 3D reconstruction models, while dynamic object categories and trajectories are recovered through multimodal detection and tracking algorithms.

The dataset contains spatio-temporally synchronized multimodal data, including 70,000 RGB images, 70,000 depth maps, 70,000 LiDAR point clouds, 20,000 mmWave radar point clouds, and 70,000 large-scale and small-scale channel fading samples.

Experimental results indicate that, with the support of SynthSoM-Twin, injecting less than 15% real-world data can achieve a favorable balance between model performance and real-world data collection cost. This suggests that high-quality digital-twin datasets can substantially reduce measurement costs, with an average reduction of approximately 85%.

We warmly welcome researchers and developers in integrated sensing and communication, multimodal perception, embodied intelligence, and digital-twin simulation to follow, use, and contribute to SynthSoM-Twin after its release.


  Currently, the communication and multi-modal sensors in embodied agents operate in isolation, which severely limits their intelligence capabilities. To overcome these limitations, inspired by synesthesia of human (SoH), the Pervasive Connectivity and Networked Intelligence (PCNI) group led by Professor Xiang Cheng proposed a novel framework and technological paradigm named Synesthesia of Machines (SoM). This paradigm aims at AI-native intelligent integration across the full electromagnetic spectrum, driving embodied intelligence to evolve from β€œmimicking humans” to β€œsurpassing humans”.

  The breakthrough performance improvement of AI-native SoM systems fundamentally depends on the availability of massive and high-quality multi-modal sensing-communication datasets spanning the full electromagnetic spectrum. Through a community-driven co-construction mechanism, this dataset platform continuously expands the scale of its data resources and is committed to constructing a massive and high-quality full-spectrum dataset, including RF communication channel data, RF mmWave radar data, as well as non-RF sensory data such as images and LiDAR data.

models 0

None public yet