How to deal with the input data that contains the different sampling points?

Question

How to deal with the input data that contains the different sampling points?

Irving-ren opened this issue 5 years ago · comments

Hi, Thanks for your generosity, What a great job! I just checked the input data of demo script that have the exactly same sampling points, which showed the experimental control temporally. Do you have any suggestion for dealing with the different sampling points with EEGnet models? Right now, I was using re-sampling to align it.

vlawhern · Answer 1 · Thu Nov 28 2019 10:14:49 GMT+0800 (China Standard Time)

If by sampling points you mean the sampling rate of the EEG signal, then you can resample the signal however you like. The EEGNet models were validated with 128Hz sampling rates. While I haven't tested this extensively, you could increase the temporal filter sizes and average-pool layers proportionally depending on your input sampling rate; i.e. if your input sampling rate is 256Hz (double that of 128Hz) then you would double the lengths of the temporal filters and double the average pooling sizes.

zih · Answer 2 · Thu Nov 28 2019 14:20:14 GMT+0800 (China Standard Time)

Thanks for your explanation about the parameters adjustment in terms of different sampling rate. Maybe I didn't describe it clearly. By sampling points(Sp) I mean that sp= time* sampling_rate(constant), which was down-sampling to the 128Hz already. However, the time of each epoch was different. That's the reason I got the different Sp in the model before training. Does that make sense?

zih · Answer 3 · Thu Nov 28 2019 14:42:40 GMT+0800 (China Standard Time)

Sorry for my missing,The word epoch i mentioned before is inappropriate here. Correctly, it's trial.

vlawhern · Answer 4 · Fri Nov 29 2019 02:10:46 GMT+0800 (China Standard Time)

You can try to zero-pad the signal so that all trials are the same length. In this case I'd recommend zero-padding the left side of the signal (to maintain a form of causality) as opposed to the right side although this may not matter too much.

This strategy should be OK as long as the difference in trial lengths isn't too severe; if some trials are say 5 seconds and some others are only 500ms then I'm not too sure if this strategy will work. But for minor differences in trial lengths (I'd say within 500ms or so) I think this will work.

zih · Answer 5 · Fri Nov 29 2019 14:01:35 GMT+0800 (China Standard Time)

Exactly, I am on the same page with you. Within the minor difference among trials, the zero-padding will be a optional way. Unfortunately, the variation among all the trials turns out to be very huge. In that case, what do you think of the further segmentation process according to certain rule to solve it?

vlawhern · Answer 6 · Sat Nov 30 2019 01:39:11 GMT+0800 (China Standard Time)

If there's a lot of variation due to outliers perhaps you can get rid of them first? Say, remove the shortest and longest 10% of trials (so only train on the middle 80%).

Also you have to be careful if classes are correlated with trial length. For example, if class 1 trials are always shorter than class 2 trials (so more zero-padding is needed for class 1 trials) then a naive classifier could just say "if a trial has more zeros it's more likely class 1" which may be an accurate classifier but is obviously bad from a machine learning perspective.

Beyond this I don't have much more advice to give...

zih · Answer 7 · Mon Dec 02 2019 11:08:49 GMT+0800 (China Standard Time)

Yes, I have already done that processing, which simply shows the distribution of whole vector consisted by the trials length. BTW, it got a logarithm decreasing shape. Before doing it, I removed the around 100 trials(totally 732 trials), which was including the duration shorten than 1 second and longer than 60 seconds. Thanks for your advice previously, I will re-think this issue from the beginning now.