Beast code in Giters

AndreJJXu's starred repositories

Vim

[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Language:PythonApache-2.0265300

Diff-Foley

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

Language:PythonApache-2.013600

align_sd

Better Aligning Text-to-Image Models with Human Preference. ICCV 2023

Language:PythonApache-2.025800

chameleon

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Language:PythonNOASSERTION158900

VGen

Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models

Language:Python279800

RevIN

RevIN: Reversible Instance Normalization For Accurate Time-series Forecasting Against Distribution Shift

Language:PythonMIT23200

AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.

MIT40400

audio2photoreal

Code and dataset for photorealistic Codec Avatars driven from audio

Language:PythonNOASSERTION262900

T2I-CompBench

[Neurips 2023] T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Language:PythonMIT17400

pykan

Kolmogorov Arnold Networks

Language:Jupyter NotebookMIT1390800

cobra

Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference

Language:PythonMIT22600

LSLD

Language:Python1100

DG-SCT

NeurIPS'2023 official implementation code

Language:Python5300

LAVISH

Vision Transformers are Parameter-Efficient Audio-Visual Learners

Language:Python8000

audio-dataset

Audio Dataset for training CLAP and other models

Language:Python60600

CLAP

Contrastive Language-Audio Pretraining

Language:PythonCC0-1.0126600

RPG-DiffusionMaster

[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)

Language:Jupyter Notebook161200

Panda-70M

[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Language:Python45700

ViTamin

[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"

Language:PythonApache-2.015600

SceneWiz3D

[CVPR 2024] SceneWiz3D: Towards Text-guided 3D Scene Composition

9100

MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Language:PythonApache-2.0310800

ControlNet-v1-1-nightly

Nightly release of ControlNet 1.1

Language:Python454600

ControlNet

Let us control diffusion models!

Language:PythonApache-2.02918400

dreambooth

CC-BY-4.079800

blended-latent-diffusion

Official implementation for "Blended Latent Diffusion" [SIGGRAPH 2023]

Language:Jupyter NotebookMIT54400

SyncDiffusion

Official implementation of SyncDiffusion.

Language:Jupyter NotebookMIT14200

MultiDiffusion

Official Pytorch Implementation for "MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation" presenting "MultiDiffusion" (ICML 2023)

Language:Jupyter Notebook95000

ModalBiasAVSR

Offical implementation of the CVPR 2024 paper: A Study of Dropout-Induced Modality Bias on Robustness to Missing Video.

800

clotho-dataset

Python code for handling the Clotho dataset.

Language:PythonNOASSERTION7400

AndreJJXu