Vision and Language Group@ MIL's repositories
bottom-up-attention.pytorch
A PyTorch reimplementation of bottom-up-attention models
activitynet-qa
An VideoQA dataset based on the videos from ActivityNet
mt-captioning
A PyTorch implementation of the paper Multimodal Transformer with Multiview Visual Representation for Image Captioning
Language:HTML000