ZJHTerry18

Zhao Jiahe's starred repositories

MiniGPT-4

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)

Language:PythonBSD-3-Clause25157 220 452

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonApache-2.018196 158 1400

LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Language:Jupyter NotebookBSD-3-Clause9244 96 627

LLaMA-Adapter

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

Language:PythonGPL-3.05618 78 141

mmtracking

OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.

Language:PythonApache-2.03448 47 459

LLaMA2-Accessory

An Open-source Toolkit for LLM Development

Language:PythonNOASSERTION2622 36 133

A computer vision closed-loop learning platform where code can be run interactively online. 学习闭环《计算机视觉实战演练：算法与应用》中文电子书、源码、读者交流社区（持续更新中 ...） 📘 在线电子书 https://charmve.github.io/computer-vision-in-action/ 👇项目主页

Language:Jupyter NotebookNOASSERTION2467 36 75

InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Language:Python2240 40 334

GLIP

Grounded Language-Image Pre-training

Language:PythonMIT2081 45 168

LISA

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

Language:PythonApache-2.01650 11 127

awesome-openai-vision-api-experiments

Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥

Language:Python1606 27 5

open-images-dataset

Open Images is a dataset of ~9 million images that have been annotated with image-level labels and bounding boxes spanning thousands of classes.

966 36 34

VisionLLM

VisionLLM Series

Language:PythonApache-2.0745 38 11

GPT4RoI

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

Language:PythonNOASSERTION479 8 44

fromage

🧀 Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".

Language:Jupyter NotebookApache-2.0467 12 37

Cheetah

Language:PythonBSD-3-Clause337 18 16

MIC

MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU

Language:Python310 9 31

PCT

This is an official implementation of our CVPR 2023 paper "Human Pose as Compositional Tokens" (https://arxiv.org/pdf/2303.11638.pdf)

Language:PythonMIT293 6 40

LAMM

[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents

Language:Python285 8 42

unified-io-inference

Language:Jupyter NotebookApache-2.0213 13 12

ContextDET

Contextual Object Detection with Multimodal Large Language Models

NOASSERTION172 13 5

APTM

The official code of "Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark"

Language:PythonMIT124 4 24

ISR_ICCV2023_Oral

The code for ICCV2023 Oral paper: Identity-Seeking Self-Supervised Representation Learning for Generalizable Person Re-identification

Language:Python75 5 7

UAL

The code for ECCV2022 paper: Reliability-Aware Prediction via Uncertainty Learning for Person Image Retrieval

58 10

RegionBLIP

Language:Python56 1 6

DistributionNet

Language:Python42 4 5

PVIT

Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models

Language:Python35 2 2

UniPT

Language:PythonApache-2.026 4 8

pointingqa

Code for paper "Point and Ask: Incorporating Pointing into Visual Question Answering"

Language:Python18 3 2

Pix2SeqV2-Pytorch

Simple Implementation of Pix2seqV2(multi-task)

Language:Python1600