WANG XIN (KingMV)

KingMV

Geek Repo

Company:Beijing jiaotong University

Location:China

Github PK Tool:Github PK Tool

WANG XIN's starred repositories

llama3

The official Meta Llama 3 GitHub site

Language:PythonLicense:NOASSERTIONStargazers:26211Issues:217Issues:237

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Language:PythonLicense:MITStargazers:5543Issues:50Issues:543

NExT-GPT

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

Language:PythonLicense:BSD-3-ClauseStargazers:3212Issues:58Issues:96

Monkey

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Language:PythonLicense:MITStargazers:1775Issues:22Issues:129

LLaVA-Med

Large Language-and-Vision Assistant for Biomedicine, built towards multimodal GPT-4 level capabilities.

Language:PythonLicense:NOASSERTIONStargazers:1458Issues:27Issues:84

Multimodal-AND-Large-Language-Models

Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.

awesome-text-to-image-studies

A collection of awesome text-to-image generation studies.

Language:TeXLicense:MITStargazers:336Issues:13Issues:0

awesome-video-generation

A collection of awesome video generation studies.

Language:TeXLicense:MITStargazers:266Issues:11Issues:1

Awesome_Multimodel_LLM

Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.

Efficient-Multimodal-LLMs-Survey

Efficient Multimodal Large Language Models: A Survey

brain-inspired-replay

A brain-inspired version of generative replay for continual learning with deep neural networks (e.g., class-incremental learning on CIFAR-100; PyTorch code).

Language:PythonLicense:MITStargazers:226Issues:7Issues:12

UnIVAL

[TMLR23] Official implementation of UnIVAL: Unified Model for Image, Video, Audio and Language Tasks.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:223Issues:5Issues:9

Efficient_Foundation_Model_Survey

Survey Paper List - Efficient LLM and Foundation Models

PixelLM

PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding. PixelLM is accepted by CVPR 2024.

Language:PythonLicense:Apache-2.0Stargazers:175Issues:4Issues:24

Awesome-Object-Pose-Estimation

Project Page for Paper "Deep Learning-Based Object Pose Estimation: A Comprehensive Survey"

Count-Anything

This method uses Segment Anything and CLIP to ground and count any object that matches a custom text prompt, without requiring any point or box annotation.

ShareGPT4V

[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions

Language:PythonLicense:MITStargazers:91Issues:3Issues:27

CLTR

[ECCV 2022] An End-to-End Transformer Model for Crowd Localization

Language:PythonLicense:MITStargazers:87Issues:3Issues:31

PseCo

(CVPR 2024) Point, Segment and Count: A Generalized Framework for Object Counting

Language:Jupyter NotebookStargazers:76Issues:3Issues:16
Language:PythonStargazers:31Issues:0Issues:0

SelTDA

[CVPR 23] Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!

MSCANet

This is source code for the crowd counting network (MSCANet), which is published in ICSP2020. The paper name is "Multi-Scale Context Aggregation Network with Attention-Guided for Crowd Counting".

Language:PythonLicense:GPL-3.0Stargazers:3Issues:4Issues:2

CCLIS

Contrastive Continual Learning with Importance Sampling and Prototype-Instance Relation Distillation

DPD

Code for "Dynamic Proxy Domain Generalizes the Crowd Localization by Better Binary Segmentation"

Language:PythonStargazers:2Issues:0Issues:0

KingMV.github.io

Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes

Language:JavaScriptLicense:MITStargazers:1Issues:0Issues:0