error in the documentation of the image embedding sizes ?

Question

error in the documentation of the image embedding sizes ?

adrienchaton opened this issue 3 years ago · comments

Hi,

In the API documentation it is written that openl3.models.load_image_embedding_model accepts 6144 and 512 as embedding_size. It seems that it is not the same as openl3.models.load_audio_embedding_model and that in fact it accepts 8192 and 512 as sizes.

It does not seem specified in your paper, shall I assume that the (image,audio) models have been trained as pairs of embedding sizes (8192,6144) and (512,512) for the different configurations of input_repr and content_type ?

Thanks !

adrienchaton · Answer 1 · Fri May 14 2021 19:22:03 GMT+0800 (China Standard Time)

while enumerating the configurations and checking all forward fine, I am also observing an issue I didn't have so far ...
just letting you know in case .. I can give you more details on the run if that is relevant

if I use the same (model_image,model_audio) and iteratively forward through 1000+ pairs of (image,audio) the compute time doesn't increase, but if I switch the model configurations it seems that it grows big in as few as 12 steps (at the end it takes several seconds to compute the embeddings vs ~100ms at the start)

Aurora Cramer · Answer 2 · Sat Aug 07 2021 04:50:08 GMT+0800 (China Standard Time)

Good catch! We'll fix the documentation in the next release.

Regarding the slowdown, my guess is that it could be an issue with memory if TensorFlow isn't properly garbage collecting the old models and corresponding computational graphs. Perhaps you could try running tf.keras.backend.clear_session() at the end of the loop and see if that helps. Let us know if that helps!

Aurora Cramer · Answer 3 · Tue Aug 10 2021 06:37:51 GMT+0800 (China Standard Time)

The documentation should be correct now, addressed in #72

Feel free to let us know if the slowdown was resolved!