aaronchen's repositories
3D-Speaker
A repository for single- and multi-modal speaker verification, speaker recognition, and speaker diarization.
awesome-chatgpt-dataset
Unlock the Power of LLM: Explore These Datasets to Train Your Own ChatGPT!
Awesome-Diffusion-Personalization
A collection of resources on personalization with diffusion models.
backgroundremover
Background Remover lets you Remove Background from images and video using AI with a simple command line interface that is free and open source.
cav-mae
Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
chatglm_finetuning
chatglm 6b finetuning and alpaca finetuning
CMGAN
Conformer-based Metric GAN for speech enhancement
dinov2
PyTorch code and models for the DINOv2 self-supervised learning method.
GenerativeDiffusionPrior
Generative Diffusion Prior for Unified Image Restoration and Enhancement (CVPR2023)
Hitomi-Downloader
:cake: Desktop utility to download images/videos/music/text from various websites, and more.
ImageBind
ImageBind One Embedding Space to Bind Them All
ImageReward
ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
Inter-SubNet
The official PyTorch implementation of "Inter-SubNet: Speech Enhancement with Subband Interaction", accepted by ICASSP 2023.
langchain-ChatGLM
langchain-ChatGLM, local knowledge based ChatGLM with langchain | 基于本地知识库的 ChatGLM 问答
loopy
A data framework for music information retrieval focusing on electronic music.
Mug-Diffusion
High-quality and Controllable Charting AI for Rhythm Games, Modifed from Stable Diffusion
NS2VC
Unofficial implementation of NaturalSpeech2 for Voice Conversion
Personalize-SAM
Personalize Segment Anything Model (SAM) with 1 shot in 10 seconds
sinc
Official PyTorch implementation of the paper "SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation"
so-vits-svc-1
SoftVC VITS Singing Voice Conversion
so-vits-svc-fork
so-vits-svc fork with realtime support, improved interface and more features.
StableSR
Exploiting Diffusion Prior for Real-World Image Super-Resolution
SVT_SpeechBrain
Deep Audio-Visual Singing Voice Transcription based on Self-Supervised Learning Models
tango
Codes and Model of the paper "Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model"
tunesformer
TunesFormer: Forming Tunes with Control Codes
vid2avatar
Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition (CVPR2023)
video2midi
youtube synthesia video to midi
Waveformer
An efficient architecture for real-time target sound extraction.