luogen1996

followers

following

stars

https://luogen1996.github.io/

luogen1996's repositories

LaVIN

[NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"

Language:Python489 6 41

RepAdapter

Official implementation of "Towards Efficient Visual Adaption via Structural Re-parameterization".

Language:Python184 15 14

LLaVA-HR

LLaVA-HR: High-Resolution Large Language-Vision Assistant

Language:PythonApache-2.0176 3 15

MCN

[CVPR2020] Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation, CVPR2020 (oral)

Language:PythonMIT133 6 12

OneTeacher

Language:PythonGPL-3.078 1 13

SimREC

A lightweight codebase for referring expression comprehension and segmentation

Language:PythonApache-2.049 20

LWTransformer

Lightweight Transformer for Multi-modal Tasks

Language:Python15 20

MoIL

Language:PythonMIT1 20

datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...

Language:PythonApache-2.0010

detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Language:PythonApache-2.0010

detr

End-to-End Object Detection with Transformers

Language:PythonApache-2.0010

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Language:PythonMIT020

FGD

Focal and Global Knowledge Distillation for Detectors (CVPR 2022)

Language:PythonApache-2.0010

LaConvNet

Language:PythonApache-2.003 1

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonApache-2.0000

lmbxmu

010

luogen1996

020

luogen1996.github.io

Language:HTML020

MAttNet

MAttNet: Modular Attention Network for Referring Expression Comprehension

Language:Jupyter NotebookMIT020

mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Language:PythonNOASSERTION010

openvqa

A lightweight, scalable, and general framework for visual question answering research

Language:PythonApache-2.0010

RealGIN-Keras

Language:PythonMIT020

segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookApache-2.0000

SeqTR

SeqTR: A Simple yet Universal Network for Visual Grounding

Language:Python010

Swin-Transformer

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".

Language:PythonMIT010

TencentPretrain

Tencent Pre-training framework in PyTorch & Pre-trained Model Zoo

Language:PythonNOASSERTION000

TRAR-VQA

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

Language:PythonMIT010

VC-R-CNN

The official pytorch implementation of CVPR 2020 ``Visual Commonsense R-CNN''

Language:PythonMIT020

vision_transformer

Language:Jupyter NotebookApache-2.0010

zhiweichen0012

010