test dataset

Question

test dataset

xuchengjian632 opened this issue 5 months ago · comments

May I ask why in def getitem() index_n_sub_test has to be multiplied by 80 and test_index has to be divided by 80 when you have already averaged 80 repetitions of the test data in EEG_dataset() in EEG_Image_decode/Generation/eegdatasets_leaveone.py.

Dongyang Li · Answer 1 · Sat May 18 2024 21:45:31 GMT+0800 (China Standard Time)

@xuchengjian632 This is an engineering detail. We need to average the EEG data here, but the EEG labels still need to be correct. This is written to ensure that the corresponding label is found every time and can better match the data with the label.

Chengjian Xu · Answer 2 · Sat May 18 2024 22:24:45 GMT+0800 (China Standard Time)

@xuchengjian632 This is an engineering detail. We need to average the EEG data here, but the EEG labels still need to be correct. This is written to ensure that the corresponding label is found every time and can better match the data with the label.

But here after data processing data_list and label_list after cating the data dimensions are (200, 63, 250) and (200, ) respectively, the EEG data and labels don't already correspond to each other?

Dongyang Li · Answer 3 · Sat May 18 2024 22:28:01 GMT+0800 (China Standard Time)

@xuchengjian632 Obviously, (200, 63, 250) represents 200 categories, each category has 1 sample, and this dimension is omitted; each sample has 63 channels, and the length of each sample is 250;
(200, ) represents 200 categories, each category has 1 sample label, so this dimension is also omitted.

Chengjian Xu · Answer 4 · Sat May 18 2024 22:39:39 GMT+0800 (China Standard Time)

@dongyangli-del I understand the meaning of data dimensionality. What I mean is that, with the test dataset obtained, after being processed by the load_data() function, the EEG data and labels have already been matched (both the data and labels have the first dimension value of 200). So, is it necessary to divide by 80 again in the getitem(self, index)?

Dongyang Li · Answer 5 · Sun May 19 2024 00:05:36 GMT+0800 (China Standard Time)

@xuchengjian632 Thank you for your question.
In our code, we considered a more general situation, because it is possible that the best results cannot be obtained by averaging the repetitions of all test sets.
According to the conclusions in the paper by song et al., the best results can be achieved on average at 55 repetitions. Our code provides scalability for this scaling low test.

Citations
Song, Yonghao, et al. "Decoding Natural Images from EEG for Object Recognition." arXiv preprint arXiv:2308.13234 (2023).

Chengjian Xu · Answer 6 · Sun May 19 2024 11:34:31 GMT+0800 (China Standard Time)

@dongyangli-del So, does the code for obtaining data here not need to divide by 80 when normally acquiring test set data?

Dongyang Li · Answer 7 · Sun May 19 2024 11:46:20 GMT+0800 (China Standard Time)

Hi @xuchengjian632, I think you didn't figure out the basic logic of the code. If you have other questions, you can add my wechat:KeepRevere2Nature.
Now that the original version of this issue has been resolved, I will close this issue.

Chengjian Xu · Answer 8 · Sun May 19 2024 11:58:33 GMT+0800 (China Standard Time)

@dongyangli-del Okay, thank you for your response.