vlawhern / arl-eegmodels

This is the Army Research Laboratory (ARL) EEGModels Project: A Collection of Convolutional Neural Network (CNN) models for EEG signal classification, using Keras and Tensorflow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A question about data imbalance

ZhangXiao96 opened this issue · comments

Hello, thanks for sharing your code. I was wondering how the authors deal with the data imbalance problem while training the P300 data. It would be nice if the authors could share your code. Thanks again!

Data imbalance is handled by using the class_weight option in model.fit. For example, if a P300 study had a target probability of 20% (so for every 5 images 4 are non-targets and 1 is a target), then you could set the class weight to be the inverse proportion in the training set (so in this example non-targets with a weight of 1 and targets with a weight of 4). In Keras you can specify a dict of classes with their weights like this:

class_weight_P300 = {0:1, 1:4}

where the syntax is X:Y, with X = the numerical class label and Y its class weight.

Then, in model.fit, you pass in the class_weight option:

fittedModel = model.fit(x_train, y_train, batch_size, epochs, class_weight = class_weight_P300)

This seemed to work pretty well for me and is pretty straightforward to use.

Thanks, that's really helpful ! And I was also wondering if you have tested EEGNet on synchronized averaging of few EEG epochs for ERP signaling (P300 and ERN), since, if I didn't miss things in your paper, EEGNet was designed for single trial.

I've only tried it with single-trial but I believe it should also work on averaged trials (nothing about the architecture is specific to single-trial or averaged trials). I'd be interested in hearing about any successes with this approach.

Ok, thanks for your help! And Mmm.... I have one last question that have you ever trained EEGNet on the whole BCI_2A train set instead of the train set of a single subject? I tried but didn't get good results,.

I've gotten good results training on BCI IV 2A for cross-subject training as long as you have some subject-specific data in the validation set; i.e. BCI IV 2A has 9 subjects worth of train and test. So take the data from 8 subjects to be the training set, take the last subjects training data to be the validation set, then take that subjects test set to be the test set. If you do it this way you can get decent results (not as good as within-subject training though).

If you don't have any subject-specific data in the validation set you'll get pretty bad results for BCI IV 2A, although no technique I tried did better (the EEGNet paper tested FBCSP and two other CNN models, none of which did that good).

OK, thanks for your help !

Hello, I'm very glad to find that EEGNet works perfectly on synchronized averaging of few EEG epochs for ERP signaling and it could even get high accuracy on single test epochs though it was just trained on the averaged ones!

Good to hear! Let me know if your work gets published; would definitely like to take a look at it.