Failed to download audio_feature.h5

Question

Failed to download audio_feature.h5

asker-github opened this issue 4 years ago · comments

Does anyone have the link of Chinese Baidu disk or thunder of audio_feature.h5? I can only download it with Google browser. Because it's too big, I fail every time.

zhusicheng · Answer 1 · Thu Oct 08 2020 16:14:39 GMT+0800 (China Standard Time)

I try to make audio_feature.h5 myself, but I don't know if it will have any bad effect.

YapengTian · Answer 2 · Thu Oct 08 2020 23:39:52 GMT+0800 (China Standard Time)

I uploaded it to Dropbox. Here is the link: https://www.dropbox.com/s/djweo9ew9pqv8xi/audio_feature.h5?dl=0.

zhusicheng · Answer 3 · Fri Oct 09 2020 13:39:04 GMT+0800 (China Standard Time)

I uploaded it to Dropbox. Here is the link: https://www.dropbox.com/s/djweo9ew9pqv8xi/audio_feature.h5?dl=0.

Oops, that's my problem. It should be visual_feature.h5. I was so excited that I typed the wrong file name.

zhusicheng · Answer 4 · Fri Oct 09 2020 13:48:19 GMT+0800 (China Standard Time)

I tried to download your link. The speed should be similar to the links in readme. Because I'm using chrome to download, even if you upload to Dropbox, I may still fail to download. Every time I download half of it, it will fail. Maybe the network is not very good.
haha. Maybe I have to generate the file myself. thank you.

zhusicheng · Answer 5 · Fri Oct 09 2020 16:28:40 GMT+0800 (China Standard Time)

The file size I generated is 8.3G. It is generated according to the video name in each line of the Annotations.txt. But you provide 7.7g, I don't know what the difference is.

YapengTian · Answer 6 · Sat Oct 10 2020 12:22:50 GMT+0800 (China Standard Time)

If you used the provided scripts and followed the order of Annotations.txt, it should be correct.

zhusicheng · Answer 7 · Sat Oct 10 2020 21:23:35 GMT+0800 (China Standard Time)

If you used the provided scripts and followed the order of Annotations.txt, it should be correct.

Hello, my torch version is 1.5.1.When I tested(python supervised_main.py --model_name AV_att), this error occurred.
Traceback (most recent call last):
File "supervised_main.py", line 159, in
test(args)
File "supervised_main.py", line 148, in test
x_labels = model(audio_inputs, video_inputs)
File "/home/zhu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/zhu/zhu_tf/audio_visual/AVE-ECCV18-master/models.py", line 66, in forward
self.lstm_video.flatten_parameters()
File "/home/zhu/.local/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 106, in flatten_parameters
if len(self._flat_weights) != len(self._flat_weights_names):
File "/home/zhu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 594, in getattr
type(self).name, name))
AttributeError: 'LSTM' object has no attribute '_flat_weights'`
I'm trying to fix this mistake right now. I want to know if I can continue to train or test on the model you provided if I solve this error.

YapengTian · Answer 8 · Sun Oct 11 2020 03:38:58 GMT+0800 (China Standard Time)

I was using 0.30. If you run it using 1.5.1, I think you need to modify code accordingly.

zhusicheng · Answer 9 · Mon Oct 12 2020 18:21:07 GMT+0800 (China Standard Time)

I was using 0.30. If you run it using 1.5.1, I think you need to modify code accordingly.

Hello, first of all, thank you for your kind reply.I have two questions for you. ^_^

weak_ supervised_ main.py ：visual_ feature_ noisy.h5、audio_ feature_ noisy.h5、mil_ labels.h5、labels_ noisy.h5、
In addition, I don't know whether these documents correspond to Annotations.txt. Because I want to study several other classes.

cmm_ train.py ：labels_ closs.h5，visual_ feature_ vec.h5，train_ order_ match.h5，val_ order_ match.h5，test_ order_ match.h5
Besides, I have no idea what these documents are. visual_ feature_ vec.h5 is not available for download, and I'm also upset. Looking forward to your reply, thank you!

YapengTian · Answer 10 · Wed Oct 14 2020 11:27:47 GMT+0800 (China Standard Time)

The noisy features are from some randomly selected videos which are in the background class. They do not correspond to Annotations.txt. The videos can be found https://drive.google.com/file/d/1Iqba9lk_KOxxf5CFV33_XVoC5nuG8wiu/. The mil_labels are video-level labels.

As given in the Readme, visual_ feature_ vec.h5 can be downloaded from https://drive.google.com/file/d/1l-c8Kpr5SZ37h-NpL7o9u8YXBNVlX_Si/view. labels_ closs.h5 contains labels for the contrastive loss. visual_ feature_ vec.h5 contains visual features. The other three are data splitting orders.

zhusicheng · Answer 11 · Fri Oct 16 2020 16:59:03 GMT+0800 (China Standard Time)

I retrain the Male speech, Female speech and background on your supervised tasks. The accuracy is about the same as yours, but the effect of finding the sounding part in the picture is very bad (python attention_visualization.py). How can I achieve the effect in your paper?

YapengTian · Answer 12 · Mon Oct 19 2020 13:19:23 GMT+0800 (China Standard Time)

I used data from different categories to train the model before. Since you only use the limited speech data, it is reasonable that the model fails to find the sounding parts for objects in other categories.

YapengTian · Answer 13 · Mon Oct 19 2020 13:21:21 GMT+0800 (China Standard Time)

If you only want to explore face-speech data, you might train the model on a large set with only human talking videos such as active speaker detection dataset: https://arxiv.org/abs/1901.01342.

zhusicheng · Answer 14 · Mon Oct 19 2020 13:41:44 GMT+0800 (China Standard Time)

I used data from different categories to train the model before. Since you only use the limited speech data, it is reasonable that the model fails to find the sounding parts for objects in other categories.

I'm just training these three categories. I just want to identify these three categories. But the effect is not good. Maybe it's because there are fewer types of training. Thank you for your recommendation.