doubledaibo / 2dcaption_eccv2018

Rethinking the Form of Latent States in Image Captioning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Overview

Overview

Summarization

  • We empirically found representing latent states as 2D maps is better than 1D vectors, both quantitatively and qualitatively, due to the spatial locality preserved in the latent states.

  • Quantitatively, with similar numbers of parameters, RNN-2DS (i.e. 2D states without gate functions) already outperforms LSTM-1DS (i.e. 1D states with LSTM cells). (Green: RNN-2DS, Red: LSTM-1DS)

Curve

  • Qualitatively, spatial locality leads to visual interpretation and manipulation of the decoding process.

    • Manipulation on the spatial grids

    Manipulation

    • Manipulation on the channels

    Deactivation

    • Interpretation on the internal dynamics

    Dynamics

    • Interpretation on the word-channel associations

    Associations

Citation

@inproceedings{dai2018rethinking,
  title={Rethinking the Form of Latent States in Image Captioning},
  author={Dai, Bo and Ye, Deming and Lin, Dahua},
  booktitle={ECCV},
  year={2018}
}

About

Rethinking the Form of Latent States in Image Captioning

License:Other


Languages

Language:Lua 93.6%Language:Python 6.4%