BradyFU

Chaoyou Fu's starred repositories

EAGLE

EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Language:PythonApache-2.050700

MME-RealWorld

✨✨ MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

Language:Python6800

VITA

✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM

Language:PythonNOASSERTION80400

RWKU

RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024

Language:Python5200

SliME

✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models

Language:PythonApache-2.013200

Awesome-Open-Vocabulary-Detection-and-Segmentation

Awesome OVD-OVS - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

9700

Libra

Simple PyTorch implementation of "Libra: Building Decoupled Vision System on Large Language Models" (accepted by ICML 2024)

Language:PythonApache-2.04100

Video-MME

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

37400

VEGA

Language:Python3100

conv-llava

Language:PythonApache-2.010000

cantor

Language:HTML6400

MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Language:PythonApache-2.0318900

VMamba

VMamba: Visual State Space Models，code is based on mamba

Language:PythonMIT206700

APE

[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception

Language:PythonApache-2.047800

LLaMA-VID

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)

Language:PythonApache-2.069300

4DGaussians

[CVPR 2024] 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

Language:Jupyter NotebookNOASSERTION211300

GaussianDreamer

GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models (CVPR 2024)

Language:PythonApache-2.065000

Lion

Lion: Kindling Vision Intelligence within Large Language Models

5200

FF3D

3300

CNeRF

Pytorch implementation of AAAI2023 Oral paper "Semantic 3D-aware Portrait Synthesis and Manipulation Based on Compositional Neural Radiance Field"

Language:PythonNOASSERTION3900

MUST-GAN

Pytorch implementation of CVPR2021 paper "MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generation"

Language:Python7500

MUTR

[AAAI 2024] Referred by Multi-Modality: A Unified Temporal Transformers for Video Object Segmentation

Language:PythonMIT6300

PanoVOS

[ECCV 2024] PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation

BSD-3-Clause1700

Woodpecker

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.

Language:Python60100

LongLoRA

Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)

Language:PythonApache-2.0260800

MQ-Det

Official PyTorch implementation of "Multi-modal Queried Object Detection in the Wild" (accepted by NeurIPS 2023)

Language:PythonApache-2.025800

vision-process-webui

💡💡💡awesome compute vision app in gradio

Language:PythonApache-2.04100

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

1198700

SeqTR

SeqTR: A Simple yet Universal Network for Visual Grounding

Language:Python12800

TiNeuVox

TiNeuVox: Fast Dynamic Radiance Fields with Time-Aware Neural Voxels (SIGGRAPH Asia 2022)

Language:PythonApache-2.032300