jpWang

followers

following

stars

South China University of Technology

Guangzhou, China

Organizations

SCUT-DLVCLab

Jiapeng Wang's starred repositories

ICL-D3IE

Code for ICCV 2023 Paper : “ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction”

Language:Python4900

UReader

Language:PythonApache-2.010200

Monkey

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Language:PythonMIT158000

LLM-in-Vision

Recent LLM-based CV and related works. Welcome to comment/contribute!

Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Language:PythonMIT352100

groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Language:Python68600

NExT-GPT

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

Language:PythonBSD-3-Clause309000

open_clip

An open source implementation of CLIP.

Language:PythonNOASSERTION927100

SEED

Official implementation of SEED-LLaMA (ICLR 2024).

Language:PythonNOASSERTION52800

PickScore

Language:PythonMIT38600

improved-aesthetic-predictor

CLIP+MLP Aesthetic Score Predictor

Language:PythonApache-2.080100

LLMSurvey

The official GitHub page for the survey paper "A Survey of Large Language Models".

Language:Python964300

stable-diffusion-webui

Stable Diffusion web UI

Language:PythonAGPL-3.013639900

trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

Language:PythonMIT440100

nltk_data

NLTK Data

Language:Python139100

nltk

NLTK Source

Language:PythonApache-2.01327300

Union14M

[ICCV 2023] Code base for Revisiting Scene Text Recognition: A Data Perspective

Language:PythonMIT15300

diffusiondb

A large-scale text-to-image prompt gallery dataset based on Stable Diffusion

Language:PythonMIT115200

instruct-pix2pix

Language:PythonNOASSERTION612800

MagicBrush

[NeurIPS'23] "MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing".

Language:PythonNOASSERTION28600

gill

🐟 Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".

Language:Jupyter NotebookApache-2.040300

qlora

QLoRA: Efficient Finetuning of Quantized LLMs

Language:Jupyter NotebookMIT973300

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

recognize-anything

Open-source and strong foundation image recognition models.

Language:Jupyter NotebookApache-2.0261100

M6Doc

9100

ImageBind

ImageBind One Embedding Space to Bind Them All

Language:PythonNOASSERTION809700

mPLUG-Owl

mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model

Language:PythonMIT203000

FindTheChatGPTer

ChatGPT爆火，开启了通往AGI的关键一步，本项目旨在汇总那些ChatGPT的开源平替们，包括文本大模型、多模态大模型等，为大家提供一些便利

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonApache-2.01827700

MiniGPT-4

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)

Language:PythonBSD-3-Clause2517300