KingMV

Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.

248 8 1

Efficient-Multimodal-LLMs-Survey

Efficient Multimodal Large Language Models: A Survey

Apache-2.0230 7 4

brain-inspired-replay

A brain-inspired version of generative replay for continual learning with deep neural networks (e.g., class-incremental learning on CIFAR-100; PyTorch code).

Language:PythonMIT226 7 12

UnIVAL

[TMLR23] Official implementation of UnIVAL: Unified Model for Image, Video, Audio and Language Tasks.

Language:Jupyter NotebookApache-2.0223 5 9

InternVideo2

MIT199 25 3

Efficient_Foundation_Model_Survey

Survey Paper List - Efficient LLM and Foundation Models

192 5 2

PixelLM

PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding. PixelLM is accepted by CVPR 2024.

Language:PythonApache-2.0175 4 24

Awesome-Object-Pose-Estimation

Project Page for Paper "Deep Learning-Based Object Pose Estimation: A Comprehensive Survey"

141 40

Count-Anything

This method uses Segment Anything and CLIP to ground and count any object that matches a custom text prompt, without requiring any point or box annotation.

Language:Python128 3 5

ShareGPT4V

[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions

Language:Python116 2 10

crowddiff

Language:PythonMIT91 3 27

CLTR

[ECCV 2022] An End-to-End Transformer Model for Crowd Localization

Language:PythonMIT87 3 31

PseCo

(CVPR 2024) Point, Segment and Count: A Generalized Framework for Object Counting

Language:Jupyter Notebook76 3 16

SAM-E

Language:Python3100

SelTDA

[CVPR 23] Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!

Language:Python12 3 1

MSCANet

This is source code for the crowd counting network (MSCANet), which is published in ICSP2020. The paper name is "Multi-Scale Context Aggregation Network with Attention-Guided for Crowd Counting".

Language:PythonGPL-3.03 4 2

CCLIS

Contrastive Continual Learning with Importance Sampling and Prototype-Instance Relation Distillation

Language:Python2 2 1

DPD

Code for "Dynamic Proxy Domain Generalizes the Crowd Localization by Better Binary Segmentation"

Language:Python200

KingMV.github.io

Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes

Language:JavaScriptMIT100

KingMV

WANG XIN's starred repositories

llama3

ShiArthur03

InternVL

NExT-GPT

low_cost_robot

Monkey

LLaVA-Med

Multimodal-AND-Large-Language-Models

Unichat-llama3-Chinese

awesome-text-to-image-studies

awesome-video-generation

Awesome_Multimodel_LLM