Hannieliao's starred repositories
Awesome-LLMs-meet-Multimodal-Generation
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
stable-audio-tools
Generative models for conditional audio generation
hello-algo
《Hello 算法》:动画图解、一键运行的数据结构与算法教程。支持 Python, Java, C++, C, C#, JS, Go, Swift, Rust, Ruby, Kotlin, TS, Dart 代码。简体版和繁体版同步更新,English version ongoing
LLMs-from-scratch
Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step
youtube-8m-videos-downloader
Download videos from YouTube-8M dataset for testing
audiosetdl
Scripts for download AudioSet
Fast-Audioset-Download
Download audioset data super fastly with youtube-dl, ffmpeg and python multiprocessing
CVPR-2024-Speech_Audio_Music-Papers
A curated collections of papers related to speech, audio and music in CVPR 2024.
MLQuestions
Machine Learning and Computer Vision Engineer - Technical Interview Questions
Seeing-and-Hearing
[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
lightning-hydra-template
PyTorch Lightning + Hydra. A very user-friendly template for ML experimentation. ⚡🔥⚡
awesome-mlss
🤖 Machine Learning Summer School deadlines
Awesome-Video-Diffusion-Models
[Arxiv] A Survey on Video Diffusion Models
Diff-Foley
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
audioldm_eval
This toolbox aims to unify audio generation model evaluation for easier comparison.
mfa-models
Collection of pretrained models for the Montreal Forced Aligner
audio-dataset
Audio Dataset for training CLAP and other models
ImageSelect
Code for the paper "If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection"
AudioLDM-training-finetuning
AudioLDM training, finetuning, evaluation and inference.
Make-An-Audio
PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model
CoMoSpeech
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
ImageReward
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
audiocaps-download
This package aims at simplifying the download of the AudioCaps dataset.