Haotian-Zhang

Haotian Zhang's starred repositories

awesome-chatgpt-prompts

This repo includes ChatGPT prompt curation to use ChatGPT better.

Language:HTMLCC0-1.0105927 13820

segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookApache-2.044956 299 647

FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Language:PythonApache-2.035200 345 1700

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonApache-2.017377 156 1347

Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Language:Jupyter NotebookApache-2.013881 114 368

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

9933 232 98

GroundingDINO

Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Language:PythonApache-2.05330 36 275

tpu

Reference models and tools for Cloud TPUs.

Language:Jupyter NotebookApache-2.05189 359 471

Segment-Everything-Everywhere-All-At-Once

[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"

Language:PythonApache-2.04127 57 137

GPT-4-LLM

Instruction Tuning with GPT-4

Language:HTMLApache-2.04044 45 33

open_flamingo

An open-source framework for training large multimodal models.

Language:PythonMIT3518 47 170

EVA

EVA Series: Visual Representation Fantasies from BAAI

Language:PythonMIT2029 31 150

mPLUG-Owl

mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model

Language:PythonMIT1988 26 204

GLIGEN

Open-Set Grounded Text-to-Image Generation

Language:PythonMIT1862 38 73

MM-REACT

Official repo for MM-REACT

Language:PythonMIT911 19 10

mmc4

MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.

Language:PythonMIT876 9 17

Co-DETR

[ICCV 2023] DETRs with Collaborative Hybrid Assignments Training

Language:PythonMIT858 9 139

pix2seq

Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)

Language:Jupyter NotebookApache-2.0828 18 48

U-ViT

A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".

Language:Jupyter NotebookMIT815 12 24

GFocal

Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection, NeurIPS2020

Language:PythonApache-2.0567 13 40

copy-paste-aug

Copy-paste augmentation for segmentation and detection tasks

Language:Jupyter NotebookMIT526 5 18

MIMDet

[ICCV 2023] You Only Look at One Partial Sequence

Language:PythonMIT329 10 27

VLDet

[ICLR 2023] PyTorch implementation of VLDet （https://arxiv.org/abs/2211.14843）

Language:PythonNOASSERTION172 5 17

gRefCOCO

A benchmark dataset for GRES and GREC [CVPR2023 Highlight]

Language:Python166 4 6

react

REACT (CVPR 2023, Highlight 2.5%)

Language:PythonMIT121 8 3

VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automatic pipeline starting from the Conceptual Captions Image-Captioning Dataset.

CC-BY-4.073 3 2

VaLM

VaLM: Visually-augmented Language Modeling. ICLR 2023.

Language:Python54 2 5