Input of model

Question

Input of model

hadi-alikhani opened this issue 3 years ago · comments

hi
i am beginner in vqa and my question is how many question are there per image? in your dataset?
for example if one image has two question we must twice feeding image to network?
please help me.
thanks.

Maryam Sadat Hashemi · Answer 1 · Thu Apr 29 2021 15:13:19 GMT+0800 (China Standard Time)

Hi;
Thanks for your interest.

There are three questions per image.
Yes. Assume that image "i" has 3 different questions "q1", "q2", and "q3". you must feed three training examples into the network "i q1", "i q2", and "i q3".

If you have any more questions, please don't hesitate to ask.

hadi-alikhani · Answer 2 · Sun May 02 2021 02:15:24 GMT+0800 (China Standard Time)

Thank you for taking the time for me
In SAN_LSTM_Moodel_2 is you created embedding matrix,is that a word embedding or sentence embedding?

Maryam Sadat Hashemi · Answer 3 · Sun May 02 2021 17:50:43 GMT+0800 (China Standard Time)

I used an embedding layer that trains from scratch, and it is a word embedding.
self.embedding = Embedding(num_words, embedding_dim, input_length=seq_length, trainable=True)

But in the code, it is predicted that you can use pretrained word embedding like fastText. You can easily uncomment this below code in question_layer_LSTM.py
self.embedding = Embedding(num_words, embedding_dim, input_length=seq_length, weights=[embedding_matrix], trainable=False)

Note that by changing trainable=True, the fastText embedding matrix will finetune, and It will have a better result.

Please check question_layer_LSTM.py and prepare_QA.py.

hadi-alikhani · Answer 4 · Tue Jul 06 2021 22:42:59 GMT+0800 (China Standard Time)

Hi
I have a question please answer, Thank you
in Code/SAN/question_layer_LSTM.py in code use LSTM layer that they's unit is 1024 after That use one dense layer that have 1024 neuron and for reason of use the dense layer in comment you mentioned that for transform from 512 to 1024 dimension but there is no any 512 dimension please describe more.

Maryam Sadat Hashemi · Answer 5 · Tue Jul 27 2021 01:41:17 GMT+0800 (China Standard Time)

Hi,
Sorry for the late response.
The number of neurons in the original paper of SAN equals what is written in the comments. But for our task, using 1024 neurons has given better results. Generally, these are just hyperparameters that you can change according to your task.

hadi-alikhani · Answer 6 · Sun Sep 05 2021 22:19:35 GMT+0800 (China Standard Time)

Thanks for your answers, actually they were very useful.
Good Luck.