andrewowens / multisensory

Code for the paper: Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

Home Page:http://andrewowens.com/multisensory/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

What are feats['im_0'] and feats['im_1'] of example for shift model?

ruizewang opened this issue · comments

commented

Hello,
In read_example() of shift_dset.py, I saw

feats['im_0'] = tf.FixedLenFeature([], dtype=tf.string) feats['im_1'] = tf.FixedLenFeature([], dtype=tf.string)
What are im_0 and im_1?

Thank you.

In the TFRecords, I concatenated the video frames together, so that N frames of the video are represented as one giant, tall (N*256 x 256 x 3) image. Due to image size limitations, I stored them as two separate images, im_0 and im_1. I suggest rewriting the I/O code for your application – there are definitely cleaner ways of doing this.

commented

In the TFRecords, I concatenated the video frames together, so that N frames of the video are represented as one giant, tall (N*256 x 256 x 3) image. Due to image size limitations, I stored them as two separate images, im_0 and im_1. I suggest rewriting the I/O code for your application – there are definitely cleaner ways of doing this.

Wow, thanks for your reply, just like I guessed :)