Vision and Language Group@ MIL (MILVLG)

Vision and Language Group@ MIL

MILVLG

Geek Repo

Hangzhou Dianzi University

Home Page:http://mil.hdu.edu.cn/

Github PK Tool:Github PK Tool

Vision and Language Group@ MIL's repositories

mcan-vqa

Deep Modular Co-Attention Networks for Visual Question Answering

Language:PythonLicense:Apache-2.0Stargazers:431Issues:6Issues:38

openvqa

A lightweight, scalable, and general framework for visual question answering research

Language:PythonLicense:Apache-2.0Stargazers:306Issues:12Issues:29

bottom-up-attention.pytorch

A PyTorch reimplementation of bottom-up-attention models

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:286Issues:2Issues:91

prophet

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

Language:PythonLicense:Apache-2.0Stargazers:260Issues:3Issues:39

imp

a family of multimodal small language models

Language:PythonLicense:Apache-2.0Stargazers:105Issues:6Issues:5

rosita

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

Language:PythonLicense:Apache-2.0Stargazers:55Issues:0Issues:6

activitynet-qa

An VideoQA dataset based on the videos from ActivityNet

Language:PythonLicense:Apache-2.0Stargazers:51Issues:3Issues:5

mmnas

Deep Multimodal Neural Architecture Search

Language:PythonLicense:Apache-2.0Stargazers:26Issues:1Issues:11

mt-captioning

A PyTorch implementation of the paper Multimodal Transformer with Multiview Visual Representation for Image Captioning

Language:PythonLicense:Apache-2.0Stargazers:24Issues:2Issues:2
Language:PythonLicense:Apache-2.0Stargazers:9Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:5Issues:0Issues:0
Language:HTMLStargazers:0Issues:0Issues:0