Visual Captioning - For Images and Artwork

Members: Anah Veronica, Nagapriya, Niegil, Sakthisree, Shivani, Surojit
Link to Google Drive: https://drive.google.com/drive/u/0/folders/1gNShRvPH5sYv-ScG9e7eymUUpE_jc8Xl

Problem Statement

Artworks are highly subjective and are open to many interpretations. According to Panofsky’s three levels of analysis, captions of natural images are of ”pre-iconographic” description - an objective view where it is simply listing the elements present within an image. But for artwork images, this type of description does not adequately represent the tonality or accurate description of the painting. In the context of artwork images, it would be more interesting to generate ”iconographic” captions that capture the subject and symbolic relations between objects.

Our Solution

Technical Architecture

Models

We trained three different models to cater to three different use cases. The models were based on Show, Attend and Tell and provides three model outputs. The first model was trained on the COCO,Flickr 8K and Iconclass datasets, with a focus on everyday real life images, whereas the second and third model was trained on the ArtEmis dataset, with a focus on artwork. The third model was based on Artemis which differs from the second in that it undergoes two stages :

A ground emotion is predicted by the image to emotion classifier
The predicted emotion is fed into the image to caption generator

This gives the model a more subjective caption as compared to the other two models.

Sample Output

A sample output for a painting A sample output for a real life image

Here we can see the difference in outputs. The first model performs quite accurately on a real life image while the second and the third model work quite well on the artwork.

Link to more information - Arty PDF

References

[1] https://cocodataset.org/#home
[2] https://www.kaggle.com/adityajn105/flickr8k
[3] http://tems.umn.edu/pdf/Panofsky_iconology2.pdf
[4] Cetinic, E., 2021. Iconographic image captioning for artworks. arXiv preprint arXiv:2102.03942.
[5] https://github.com/optas/artemis

@article{achlioptas2021artemis, title={ArtEmis: Affective Language for Visual Art}, author={Achlioptas, Panos and Ovsjanikov, Maks and Haydarov, Kilichbek and Elhoseiny, Mohamed and Guibas, Leonidas}, journal = {CoRR}, volume = {abs/2101.07396}, year={2021} }

Surojit-KB / ARTY