Beast code in Giters

Artanic30's starred repositories

MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

Language:PythonApache-2.0183900

anole

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation

Language:Python39800

ALLaVA

Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model

Language:PythonApache-2.021900

DivideMix

Code for paper: DivideMix: Learning with Noisy Labels as Semi-supervised Learning

Language:PythonMIT52300

MacCap

AAAI 2024 Accepted Paper Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training

Language:Python700

chameleon

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Language:PythonNOASSERTION148300

LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Language:PythonMIT100200

NaviLLM

[CVPR 2024] The code for paper 'Towards Learning a Generalist Model for Embodied Navigation'

Language:PythonMIT9300

Lumina-T2X

Lumina-T2X is a unified framework for Text to Any Modality Generation

Language:PythonMIT187300

DiT-3D

🔥🔥🔥Official Codebase of "DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation"

Language:PythonApache-2.019500

InstructDet

Language:PythonNOASSERTION2500

Retrieval-Augmented-Visual-Question-Answering

This is the official repository for Retrieval Augmented Visual Question Answering

Language:PythonGPL-3.012600

Long-CLIP

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"

Language:Python47100

ManipVQA

[IROS24 Oral]ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models

Language:Python4300

VL-CheckList

Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations.

Language:Python12200

all-seeing

[ICLR 2024] This is the official implementation of the paper "The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World"

Language:Python41800

SiT

Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"

Language:PythonMIT55000

mamba

Mamba SSM architecture

Language:PythonApache-2.01160800

aliyunpan

阿里云盘命令行客户端，支持JavaScript插件，支持同步备份功能。

Language:GoApache-2.0388600

groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Language:Python66900