Vision and Language Group@ MIL

Vision and Language Group@ MIL's repositories

Deep Modular Co-Attention Networks for Visual Question Answering

Language:PythonApache-2.0455 5 38

A lightweight, scalable, and general framework for visual question answering research

Language:PythonApache-2.0327 11 29

A PyTorch reimplementation of bottom-up-attention models

Language:Jupyter NotebookApache-2.0298 1 95

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

Language:PythonApache-2.0277 2 43

a family of highly capabale yet efficient large multimodal models

Language:PythonApache-2.0190 5 8

An VideoQA dataset based on the videos from ActivityNet

Language:PythonApache-2.072 2 6

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

Language:PythonApache-2.05608

Apache-2.030 1 5

Deep Multimodal Neural Architecture Search

Language:PythonApache-2.028011

A PyTorch implementation of the paper Multimodal Transformer with Multiview Visual Representation for Image Captioning

Language:PythonApache-2.025 2 2

Language:PythonApache-2.0901

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

Language:PythonApache-2.0800

Language:PythonApache-2.0500

Language:HTML000