Beast code in Giters

Hannieliao's starred repositories

self-llm

《开源大模型食用指南》基于Linux环境快速部署开源大模型，更适合**宝宝的部署教程

Language:Jupyter NotebookApache-2.0732800

piano-a2s

End-to-end real-world polyphonic piano audio-to-score transcription with hierarchical decoding (IJCAI 2024)

Language:PythonApache-2.01700

Baton

Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"

Language:Python1200

Awesome-LLMs-meet-Multimodal-Generation

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

Language:HTML24300

stable-audio-tools

Generative models for conditional audio generation

Language:PythonMIT243100

hello-algo

《Hello 算法》：动画图解、一键运行的数据结构与算法教程。支持 Python, Java, C++, C, C#, JS, Go, Swift, Rust, Ruby, Kotlin, TS, Dart 代码。简体版和繁体版同步更新，English version ongoing

Language:JavaNOASSERTION9322400

LLMs-from-scratch

Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step

Language:Jupyter NotebookNOASSERTION2488300

youtube-8m-videos-downloader

Download videos from YouTube-8M dataset for testing

Language:Python600

audiosetdl

Scripts for download AudioSet

Language:Jupyter Notebook6400

Fast-Audioset-Download

Download audioset data super fastly with youtube-dl, ffmpeg and python multiprocessing

Language:PythonBSD-3-Clause2600

CVPR-2024-Speech_Audio_Music-Papers

A curated collections of papers related to speech, audio and music in CVPR 2024.

600

MLQuestions

Machine Learning and Computer Vision Engineer - Technical Interview Questions

280800

REPARO

The official implementation of work "REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment".

3600

Seeing-and-Hearing

[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

Language:PythonNOASSERTION10600

lightning-hydra-template

PyTorch Lightning + Hydra. A very user-friendly template for ML experimentation. ⚡🔥⚡

Language:Python398600

awesome-mlss

🤖 Machine Learning Summer School deadlines

Language:HTMLMIT263400

Awesome-Video-Diffusion-Models

[Arxiv] A Survey on Video Diffusion Models

162300

Diff-Foley

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

Language:PythonApache-2.014100

audioldm_eval

This toolbox aims to unify audio generation model evaluation for easier comparison.

Language:PythonMIT28300

mfa-models

Collection of pretrained models for the Montreal Forced Aligner

Language:PythonCC-BY-4.010500

audio-dataset

Audio Dataset for training CLAP and other models

Language:Python61000

ImageSelect

Code for the paper "If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection"

Language:PythonMIT2700

d3po

[CVPR 2024] Code for the paper "Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model"

Language:PythonMIT14900

AudioLDM-training-finetuning

AudioLDM training, finetuning, evaluation and inference.

Language:PythonMIT18100

Make-An-Audio

PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model

Language:PythonMIT73000

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonMIT438600

Hannieliao