Dataset Issue

Question

Dataset Issue

JackAILab opened this issue a year ago · comments

OK, I got it, I can run all these codes now. It's maybe one last BUG, when I use the the MOSI_20 dataset, the make_weights function can not well handle the multi-label, because the label of this MODI_20 dataset contain three labels. So I just use the last label as shown in the following screenshot, am I right for doing that?

Besides, I am not sure that if the dataset MOSI_20 is the same with the dataset MOSI, because I can't find the MOSI_20 dataset in your provided Google Drive. So I just copy this dataset into MOSI another dataset fold named MOSI_20. Am I right for ding so? The same problem with the dataset of IEMOCAP_20, MOSI_50, MOSEI_20, MOSEI_50.

Unfortunately, I can't reproduce the results in your paper. So some wrong parameter setting still here? The prediction results seem to be bad.

Originally posted by @JackAILab in #4 (comment)

Sun · Answer 1 · Mon Apr 24 2023 10:33:01 GMT+0800 (China Standard Time)

Sorry for the confusion. In our paper, we didn't employ the MOSI, MOSEI, and IEMOCAP datasets. We employed YouTube, MOUD, MMMO, and POM to evaluate the effectiveness.

The MOSI_20 and MOSEI_50 datasets are previously processed by others (the same as those in the google drive), in which the features are aligned to the words and the sequential length are truncated to 20 or 50. We did perform some experiments on these datasets, but we cannot reach the SOTA performance. We think the reason is that previous works employed BERT to extract textual features, however, we simply used the self-contained GloVe features. Besides, the multi-label confusion is also one of the reason discouraging us using this dataset.
If you want for the complete MOSI and MOSEI dataset, I think you can refer to this official repo (https://github.com/A2Zadeh/CMU-MultimodalSDK, but this repo is currently not available for some unknown reasons).

For the IEMOCAP dataset, it is not a public dataset. So we didn't evaluate the effectiveness on this dataset because of this issues.