maryamhashemi / Persian_VQA

Final project of the Deep Learning course.

Home Page:https://iust-deep-learning.github.io/982/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Input of model

hadi-alikhani opened this issue · comments

hi
i am beginner in vqa and my question is how many question are there per image? in your dataset?
for example if one image has two question we must twice feeding image to network?
please help me.
thanks.

Hi;
Thanks for your interest.

There are three questions per image.
Yes. Assume that image "i" has 3 different questions "q1", "q2", and "q3". you must feed three training examples into the network "i q1", "i q2", and "i q3".

If you have any more questions, please don't hesitate to ask.

Thank you for taking the time for me
In SAN_LSTM_Moodel_2 is you created embedding matrix,is that a word embedding or sentence embedding?

I used an embedding layer that trains from scratch, and it is a word embedding.
self.embedding = Embedding(num_words, embedding_dim, input_length=seq_length, trainable=True)

But in the code, it is predicted that you can use pretrained word embedding like fastText. You can easily uncomment this below code in question_layer_LSTM.py
self.embedding = Embedding(num_words, embedding_dim, input_length=seq_length, weights=[embedding_matrix], trainable=False)

Note that by changing trainable=True, the fastText embedding matrix will finetune, and It will have a better result.

Please check question_layer_LSTM.py and prepare_QA.py.

Hi
I have a question please answer, Thank you
in Code/SAN/question_layer_LSTM.py in code use LSTM layer that they's unit is 1024 after That use one dense layer that have 1024 neuron and for reason of use the dense layer in comment you mentioned that for transform from 512 to 1024 dimension but there is no any 512 dimension please describe more.

Hi,
Sorry for the late response.
The number of neurons in the original paper of SAN equals what is written in the comments. But for our task, using 1024 neurons has given better results. Generally, these are just hyperparameters that you can change according to your task.

Thanks for your answers, actually they were very useful.
Good Luck.