NTUA-Neural-Networks
Programming Exercises for the 2021-2022 NTUA class Neural-Networks
Projects
A starting point will be the official TensorFlow tutorial "Image captioning with visual attention". We will however work on another dataset and try to improve the tutorial in various places.
We will use "flickr30k-images-ecemod", a variant of flick30k for our tutorial.
The tutorial model follows the usual Encoder - Decoder Deep Learning architecture.
Encoder corresponds to steps 1 and 2 and Decoder corresponds to steps 3-5.
The tutorial does not include any reference to the quality of the generated captioning. If we consider that each image has some real captions (references) and the neuron produces its own caption (hypothesis) we will use the BLEU (Bilingual Evaluation Understudy) score, between hypothesis and references. Briefly, BLEU is a weighted average of the number of shared unigrams, bigrams, trigrams, and fourgrams between hypothesis and references. The worst captioning gets 0 and the best 1.