BioDG is a PyTorch suite containing benchmark datasets and algorithms for domain generalization, as introduced in Towards Domain Generalization for ECG and EEG Classification: Algorithms and Benchmarks.
The available algorithms were based on DomainBed and modified for 1D biosignal classification.
The currently available ECG algorithms and currently available EEG algorithms are:
- Baseline - Empirical Risk Minimization (ERM, Vapnik, 1998)
- Invariant Risk Minimization (IRM, Arjovsky et al., 2019)
- Maximum Mean Discrepancy (MMD, Li et al., 2018)
- Deep CORAL (CORAL, Sun and Saenko, 2016)
- Representation Self-Challenging (RSC, Huang et al., 2020)
The currently available datasets are:
The datasets for the ECG Domain Generalization setup were taken from the 2020 PhysioNet Challenge and are the:
- China Physiological Signal Challenge 2018 (CPSC and CPSC Extra)
- PTB and PTB-XL Diagnostic ECG Database (PTB and PTB-XL)
- St Petersburg INCART 12-lead Arrhythmia Database (INCART)
- Georgia 12-Lead ECG Challenge Database (G12EC
To download the above datasets run the following in a terminal:
# Download all files
wget -r -N -c -np -R "index.html*" https://physionet.org/files/challenge-2020/1.0.2/training/
# INCART Annotations -- Extract data and rename folder to 'annotations'
wget -r -N -c -np -R "index.html*" https://physionet.org/files/incartdb/1.0.0/training/
mv physionet.org/files/challenge-2020/1.0.2/training/ .
mv physionet.org/files/challenge-2020/1.0.0/training/ annotations/
After downloading the datasets, please also download the following files:
After extracting all datasets, the ECG data directory should follow the below tree structure:
├── annotations
├── cpsc_2018
├── cpsc_2018_extra
├── georgia
├── ptb
├── ptb-xl
├── st_petersburg_incart
├── dx_mapping_scored.csv
└── dx_mapping_unscored.csv
The datasets for the EEG Domain Generalization are provided by the BCMI laboratory of the Shanghai Jiao Tong University, and are the following:
The datasets are available for research purposes, after applying here.
- First set the following variables in the bioconfig.py file:
- hostname --> Create a block with your hostname to set data paths
- scripts_root --> Root of the code package
- _root_ecg_path --> Root path of the ecg data
- Convert the .mat ECG signal files to the appropriate format for the PyTorch DataLoader by running:
python3 ECG/convert_to_pickles.py --hostname user --outpath 'path to directory where converted data will be stored'
- Set the following variables in the ECG bioconfig.py file:
- pickle_data_dir --> should be same path as the above output path
- ecg_results_dir --> experiment results path
- Train a DG model:
python3 experiments/ecg_dg_train.py\
--network RSC --algorithm RSC\
--c "Flags are mentioned in the experiment file"
- Train our proposed model:
python3 experiments/ecg.py --model biodg_resnet18 --epochs 30 --optim adam --batch_size 128
- After downloading the datasets, run the following script to split and convert the EEG DE features to the appropriate format for the PyTorch DataLoader
python3 EEG/split_de_features.py\
--eeg_de_features_path 'path to 1s_de_feature folder in EEG dataset'
--split_data_path 'output path for split data'
--dataset 'one between china, fra or ger'
- Set the following variables in the EEG config.py file:
- pickle_data_dir --> should be the same as the split_data_path above
- eeg_results_dir --> experiment results path
- Train a model:
python3 experiments/eeg_dg_train.py\
--network ERM --algorithm ERM\
--c "Flags are mentioned in the experiment file"
- Train our proposed model:
python3 experiments/eeg.py --epochs 20 --optim adam --batch_size 128
If you use the above code for your research please cite our paper, which as of the 22nd of June 2023 has been accepted at IEEE TETCI:
@ARTICLE{10233054,
author={Ballas, Aristotelis and Diou, Christos},
journal={IEEE Transactions on Emerging Topics in Computational Intelligence},
title={Towards Domain Generalization for ECG and EEG Classification: Algorithms and Benchmarks},
year={2024},
volume={8},
number={1},
pages={44-54},
keywords={Brain modeling;Biological system modeling;Electrocardiography;Data models;Electroencephalography;Adaptation models;Feature extraction;Biosignal classification;deep learning;domain generalization;1D signal classification;electrocardiogram (ECG) classification;electroencephalogram (EEG) classification},
doi={10.1109/TETCI.2023.3306253}}
This source code is released under the MIT license.