Dataset Preparation Commands
Overview
This document provides the commands to prepare various EMG datasets for pretraining and downstream tasks. Each dataset preparation script takes in raw data, processes it into overlapping windows, and saves the processed data in HDF5 format for efficient loading during model training.
Remember to add the flag --download_data if the dataset is not downloaded yet.
Substitute the $DATA_PATH environment variable with your path for saving the dataset.
The seq_len parameter in the scripts corresponds to the window size in samples, and the stride parameter corresponds to the step size between windows in samples. The sampling rate for the pretraining datasets is 2 kHz, while for the downstream datasets it is either 200 Hz or 2 kHz depending on the dataset.
The required libraries for running the scripts are located inside the requirements.txt file.
Pretraining Datasets
For the pretraining datasets, we use a window size of 0.5 seconds with a 50% overlap at 2 kHz sampling rate:
emg2pose (0.5 sec, 50% overlap)
python scripts/emg2pose.py \
--data_dir $DATA_PATH/datasets/emg2pose_data/ \
--save_dir $DATA_PATH/datasets/emg2pose_data/h5/ \
--window_size 1000 \
--stride 500
Ninapro DB6 (0.5 sec, 50% overlap)
python scripts/db6.py \
--data_dir $DATA_PATH/datasets/ninapro/DB6/ \
--save_dir $DATA_PATH/datasets/ninapro/DB6/h5/ \
--window_size 1000 \
--stride 500
Ninapro DB7 (0.5 sec, 50% overlap)
python scripts/db7.py \
--data_dir $DATA_PATH/datasets/ninapro/DB7/ \
--save_dir $DATA_PATH/datasets/ninapro/DB7/h5/ \
--window_size 1000 \
--stride 500
Downstream Datasets
For the downstream tasks, gesture classification is performed on NinaPro DB5, EMG-EPN612, and UCI EMG datasets (200 Hz) while regression is performed on NinaPro DB8 (2 kHz).
Ninapro DB5 (1 sec, 25% overlap)
python scripts/db5.py \
--data_dir $DATA_PATH/datasets/ninapro/DB5/ \
--save_dir $DATA_PATH/datasets/ninapro/DB5/h5/ \
--window_size 200 \
--stride 50
Ninapro DB5 (5 sec, 25% overlap)
python scripts/db5.py \
--data_dir $DATA_PATH/datasets/ninapro/DB5/ \
--save_dir $DATA_PATH/datasets/ninapro/DB5/h5/ \
--window_size 1000 \
--stride 250
EMG-EPN612 (1 sec, no overlap)
python scripts/epn.py \
--data_dir $DATA_PATH/datasets/EPN612/ \
--source_training $DATA_PATH/datasets/EPN612/trainingJSON/ \
--source_testing $DATA_PATH/datasets/EPN612/testingJSON/ \
--dest_dir $DATA_PATH/datasets/EPN612/h5/ \
--window_size 200
EMG-EPN612 (5 sec, no overlap)
python scripts/epn.py \
--data_dir $DATA_PATH/datasets/EPN612/ \
--source_training $DATA_PATH/datasets/EPN612/trainingJSON/ \
--source_testing $DATA_PATH/datasets/EPN612/testingJSON/ \
--dest_dir $DATA_PATH/datasets/EPN612/h5/ \
--window_size 1000
UCI EMG (1 sec, 25% overlap)
python scripts/uci.py \
--data_dir $DATA_PATH/datasets/UCI_EMG/EMG_data_for_gestures-master/ \
--save_dir $DATA_PATH/datasets/UCI_EMG/EMG_data_for_gestures-master/h5/ \
--seq_len 200 \
--stride 50
UCI EMG (5 sec, 25% overlap)
python scripts/uci.py \
--data_dir $DATA_PATH/datasets/UCI_EMG/EMG_data_for_gestures-master/ \
--save_dir $DATA_PATH/datasets/UCI_EMG/EMG_data_for_gestures-master/h5/ \
--seq_len 1000 \
--stride 250
Ninapro DB8 (100 ms, no overlap)
python scripts/db8.py \
--data_dir $DATA_PATH/datasets/ninapro/DB8/ \
--save_dir $DATA_PATH/datasets/ninapro/DB8/h5/ \
--window_size 200 \
--stride 200
Ninapro DB8 (500 ms, no overlap)
python scripts/db8.py \
--data_dir $DATA_PATH/datasets/ninapro/DB8/ \
--save_dir $DATA_PATH/datasets/ninapro/DB8/h5/ \
--window_size 1000 \
--stride 1000