A question about data imbalance

Question

A question about data imbalance

ZhangXiao96 opened this issue 6 years ago · comments

Hello, thanks for sharing your code. I was wondering how the authors deal with the data imbalance problem while training the P300 data. It would be nice if the authors could share your code. Thanks again!

vlawhern · Answer 1 · Mon Jan 14 2019 22:34:07 GMT+0800 (China Standard Time)

Data imbalance is handled by using the class_weight option in model.fit. For example, if a P300 study had a target probability of 20% (so for every 5 images 4 are non-targets and 1 is a target), then you could set the class weight to be the inverse proportion in the training set (so in this example non-targets with a weight of 1 and targets with a weight of 4). In Keras you can specify a dict of classes with their weights like this:

class_weight_P300 = {0:1, 1:4}

where the syntax is X:Y, with X = the numerical class label and Y its class weight.

Then, in model.fit, you pass in the class_weight option:

fittedModel = model.fit(x_train, y_train, batch_size, epochs, class_weight = class_weight_P300)

This seemed to work pretty well for me and is pretty straightforward to use.

Xiao Zhang · Answer 2 · Thu Jan 17 2019 10:55:56 GMT+0800 (China Standard Time)

Thanks, that's really helpful ! And I was also wondering if you have tested EEGNet on synchronized averaging of few EEG epochs for ERP signaling (P300 and ERN), since, if I didn't miss things in your paper, EEGNet was designed for single trial.

vlawhern · Answer 3 · Thu Jan 17 2019 11:48:46 GMT+0800 (China Standard Time)

I've only tried it with single-trial but I believe it should also work on averaged trials (nothing about the architecture is specific to single-trial or averaged trials). I'd be interested in hearing about any successes with this approach.

Xiao Zhang · Answer 4 · Thu Jan 17 2019 15:07:33 GMT+0800 (China Standard Time)

Ok, thanks for your help! And Mmm.... I have one last question that have you ever trained EEGNet on the whole BCI_2A train set instead of the train set of a single subject? I tried but didn't get good results,.

vlawhern · Answer 5 · Thu Jan 17 2019 22:33:49 GMT+0800 (China Standard Time)

I've gotten good results training on BCI IV 2A for cross-subject training as long as you have some subject-specific data in the validation set; i.e. BCI IV 2A has 9 subjects worth of train and test. So take the data from 8 subjects to be the training set, take the last subjects training data to be the validation set, then take that subjects test set to be the test set. If you do it this way you can get decent results (not as good as within-subject training though).

If you don't have any subject-specific data in the validation set you'll get pretty bad results for BCI IV 2A, although no technique I tried did better (the EEGNet paper tested FBCSP and two other CNN models, none of which did that good).

Xiao Zhang · Answer 6 · Mon Jan 21 2019 09:57:40 GMT+0800 (China Standard Time)

OK, thanks for your help !

Xiao Zhang · Answer 7 · Tue Feb 26 2019 12:40:01 GMT+0800 (China Standard Time)

Hello, I'm very glad to find that EEGNet works perfectly on synchronized averaging of few EEG epochs for ERP signaling and it could even get high accuracy on single test epochs though it was just trained on the averaged ones!

vlawhern · Answer 8 · Tue Feb 26 2019 22:38:32 GMT+0800 (China Standard Time)

Good to hear! Let me know if your work gets published; would definitely like to take a look at it.