andrewowens / multisensory

Code for the paper: Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

Home Page:http://andrewowens.com/multisensory/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About the input format

ASHA-KOTERU opened this issue · comments

In the source separation model it seems like you are using *.tf files as input (rec_files_from_path in sep_dset.py).Can you please provide the format to create those TFRecord files

Sorry for the slow reply.

In case it helps, here's an example .tf file. I stored a 5-second video sequence as a giant JPEG image (after concatenating them together vertically), and provided raw uint16 audio samples. I suggest rewriting the I/O code, though! There are probably better ways to do this.

Sorry for the slow reply.

In case it helps, here's an example .tf file. I stored a 5-second video sequence as a giant JPEG image (after concatenating them together vertically), and provided raw uint16 audio samples. I suggest rewriting the I/O code, though! There are probably better ways to do this.

Thanks !

commented

Sorry to bother you. I hope to get the ".tf" file from the dataset and complete the training.Do you know how to create TFRecords files? REALLY need your help! Looking forward to your reply!

Sorry for the slow reply.

In case it helps, here's an example .tf file. I stored a 5-second video sequence as a giant JPEG image (after concatenating them together vertically), and provided raw uint16 audio samples. I suggest rewriting the I/O code, though! There are probably better ways to do this.

Th example .tf file is working,it would be helpful if you provide the script how you have created the .tf file.
Thanks!

commented

Sorry for the slow reply.
In case it helps, here's an example .tf file. I stored a 5-second video sequence as a giant JPEG image (after concatenating them together vertically), and provided raw uint16 audio samples. I suggest rewriting the I/O code, though! There are probably better ways to do this.

Th example .tf file is working,it would be helpful if you provide the script how you have created the .tf file.
Thanks!

The same question!

I have tried to train the shift model with the provided tf record by calling

python -c "import shift_params, shift_net; shift_net.train(shift_params.shift_v1(num_gpus=3), [0, 1, 2], restore = False)"

But I get the error

OutOfRangeError (see above for traceback): RandomShuffleQueue '_1_shuffle_batch_join/random_shuffle_queue' is closed and has insufficient elements (requested 15, current size 0) [[Node: shuffle_batch_join = QueueDequeueManyV2[component_types=[DT_UINT8, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64, DT_STRING], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](shuffle_batch_join/random_shuffle_queue, shuffle_batch_join/n)]]

which means that no data is being loaded. Did anybodyget the same error?
I have tried rewriting the read_example function in different ways and using multiple samples of the VoxCeleb2 dataset as well, but I keep getting the same error.