Haotian Zhang (Haotian-Zhang)

Haotian-Zhang

Geek Repo

Company:Apple AI/ML

Location:Cupertino, CA

Home Page:haotian-zhang.github.io/

Twitter:@HaotianZhang4AI

Github PK Tool:Github PK Tool

Haotian Zhang's starred repositories

awesome-chatgpt-prompts

This repo includes ChatGPT prompt curation to use ChatGPT better.

Language:HTMLLicense:CC0-1.0Stargazers:105927Issues:1382Issues:0

segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:44956Issues:299Issues:647

FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Language:PythonLicense:Apache-2.0Stargazers:35200Issues:345Issues:1700

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonLicense:Apache-2.0Stargazers:17377Issues:156Issues:1347

Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:13881Issues:114Issues:368

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

GroundingDINO

Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Language:PythonLicense:Apache-2.0Stargazers:5330Issues:36Issues:275

tpu

Reference models and tools for Cloud TPUs.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:5189Issues:359Issues:471

Segment-Everything-Everywhere-All-At-Once

[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"

Language:PythonLicense:Apache-2.0Stargazers:4127Issues:57Issues:137

GPT-4-LLM

Instruction Tuning with GPT-4

Language:HTMLLicense:Apache-2.0Stargazers:4044Issues:45Issues:33

open_flamingo

An open-source framework for training large multimodal models.

Language:PythonLicense:MITStargazers:3518Issues:47Issues:170
Language:PythonLicense:Apache-2.0Stargazers:2549Issues:37Issues:136

EVA

EVA Series: Visual Representation Fantasies from BAAI

Language:PythonLicense:MITStargazers:2029Issues:31Issues:150

mPLUG-Owl

mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model

Language:PythonLicense:MITStargazers:1988Issues:26Issues:204

GLIGEN

Open-Set Grounded Text-to-Image Generation

Language:PythonLicense:MITStargazers:1862Issues:38Issues:73
Language:PythonLicense:NOASSERTIONStargazers:1086Issues:77Issues:22

MM-REACT

Official repo for MM-REACT

Language:PythonLicense:MITStargazers:911Issues:19Issues:10

mmc4

MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.

Language:PythonLicense:MITStargazers:876Issues:9Issues:17

Co-DETR

[ICCV 2023] DETRs with Collaborative Hybrid Assignments Training

Language:PythonLicense:MITStargazers:858Issues:9Issues:139

pix2seq

Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:828Issues:18Issues:48

U-ViT

A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".

Language:Jupyter NotebookLicense:MITStargazers:815Issues:12Issues:24

GFocal

Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection, NeurIPS2020

Language:PythonLicense:Apache-2.0Stargazers:567Issues:13Issues:40

copy-paste-aug

Copy-paste augmentation for segmentation and detection tasks

Language:Jupyter NotebookLicense:MITStargazers:526Issues:5Issues:18

MIMDet

[ICCV 2023] You Only Look at One Partial Sequence

Language:PythonLicense:MITStargazers:329Issues:10Issues:27

VLDet

[ICLR 2023] PyTorch implementation of VLDet (https://arxiv.org/abs/2211.14843)

Language:PythonLicense:NOASSERTIONStargazers:172Issues:5Issues:17

gRefCOCO

A benchmark dataset for GRES and GREC [CVPR2023 Highlight]

Language:PythonLicense:Apache-2.0Stargazers:156Issues:8Issues:11

react

REACT (CVPR 2023, Highlight 2.5%)

Language:PythonLicense:MITStargazers:121Issues:8Issues:3

videoCC-data

VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automatic pipeline starting from the Conceptual Captions Image-Captioning Dataset.

VaLM

VaLM: Visually-augmented Language Modeling. ICLR 2023.