marl / openl3

OpenL3: Open-source deep audio and image embeddings

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

error in the documentation of the image embedding sizes ?

adrienchaton opened this issue · comments

Hi,

In the API documentation it is written that openl3.models.load_image_embedding_model accepts 6144 and 512 as embedding_size. It seems that it is not the same as openl3.models.load_audio_embedding_model and that in fact it accepts 8192 and 512 as sizes.

It does not seem specified in your paper, shall I assume that the (image,audio) models have been trained as pairs of embedding sizes (8192,6144) and (512,512) for the different configurations of input_repr and content_type ?

Thanks !

while enumerating the configurations and checking all forward fine, I am also observing an issue I didn't have so far ...
just letting you know in case .. I can give you more details on the run if that is relevant

if I use the same (model_image,model_audio) and iteratively forward through 1000+ pairs of (image,audio) the compute time doesn't increase, but if I switch the model configurations it seems that it grows big in as few as 12 steps (at the end it takes several seconds to compute the embeddings vs ~100ms at the start)

Screenshot 2021-05-14 at 13 23 23

Good catch! We'll fix the documentation in the next release.

Regarding the slowdown, my guess is that it could be an issue with memory if TensorFlow isn't properly garbage collecting the old models and corresponding computational graphs. Perhaps you could try running tf.keras.backend.clear_session() at the end of the loop and see if that helps. Let us know if that helps!

The documentation should be correct now, addressed in #72

Feel free to let us know if the slowdown was resolved!