luogen1996's repositories

LaVIN

[NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"

RepAdapter

Official implementation of "Towards Efficient Visual Adaption via Structural Re-parameterization".

LLaVA-HR

LLaVA-HR: High-Resolution Large Language-Vision Assistant

Language:PythonLicense:Apache-2.0Stargazers:176Issues:3Issues:15

MCN

[CVPR2020] Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation, CVPR2020 (oral)

Language:PythonLicense:MITStargazers:133Issues:6Issues:12
Language:PythonLicense:GPL-3.0Stargazers:78Issues:1Issues:13

SimREC

A lightweight codebase for referring expression comprehension and segmentation

Language:PythonLicense:Apache-2.0Stargazers:49Issues:2Issues:0

LWTransformer

Lightweight Transformer for Multi-modal Tasks

Language:PythonStargazers:15Issues:2Issues:0
Language:PythonLicense:MITStargazers:1Issues:2Issues:0

datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

detr

End-to-End Object Detection with Transformers

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Language:PythonLicense:MITStargazers:0Issues:2Issues:0

FGD

Focal and Global Knowledge Distillation for Detectors (CVPR 2022)

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0
Language:PythonLicense:Apache-2.0Stargazers:0Issues:3Issues:1

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Stargazers:0Issues:1Issues:0
Stargazers:0Issues:2Issues:0
Language:HTMLStargazers:0Issues:2Issues:0

MAttNet

MAttNet: Modular Attention Network for Referring Expression Comprehension

Language:Jupyter NotebookLicense:MITStargazers:0Issues:2Issues:0

mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Language:PythonLicense:NOASSERTIONStargazers:0Issues:1Issues:0

openvqa

A lightweight, scalable, and general framework for visual question answering research

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0
Language:PythonLicense:MITStargazers:0Issues:2Issues:0

segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:0Issues:0Issues:0

SeqTR

SeqTR: A Simple yet Universal Network for Visual Grounding

Language:PythonStargazers:0Issues:1Issues:0

Swin-Transformer

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

TencentPretrain

Tencent Pre-training framework in PyTorch & Pre-trained Model Zoo

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

TRAR-VQA

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

VC-R-CNN

The official pytorch implementation of CVPR 2020 ``Visual Commonsense R-CNN''

Language:PythonLicense:MITStargazers:0Issues:2Issues:0
Language:Jupyter NotebookLicense:Apache-2.0Stargazers:0Issues:1Issues:0
Stargazers:0Issues:1Issues:0