chenchy

aaronchen's repositories

3D-Speaker

A repository for single- and multi-modal speaker verification, speaker recognition, and speaker diarization.

Apache-2.0000

awesome-chatgpt-dataset

Unlock the Power of LLM: Explore These Datasets to Train Your Own ChatGPT!

GPL-3.0000

Awesome-Diffusion-Personalization

A collection of resources on personalization with diffusion models.

MIT000

backgroundremover

Background Remover lets you Remove Background from images and video using AI with a simple command line interface that is free and open source.

Language:PythonMIT000

cav-mae

Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".

BSD-2-Clause000

chatglm_finetuning

chatglm 6b finetuning and alpaca finetuning

000

CMGAN

Conformer-based Metric GAN for speech enhancement

MIT000

dinov2

PyTorch code and models for the DINOv2 self-supervised learning method.

NOASSERTION000

facetts

Language:PythonApache-2.0000

GenerativeDiffusionPrior

Generative Diffusion Prior for Unified Image Restoration and Enhancement (CVPR2023)

000

Hitomi-Downloader

:cake: Desktop utility to download images/videos/music/text from various websites, and more.

000

ImageBind

ImageBind One Embedding Space to Bind Them All

Language:PythonNOASSERTION000

ImageReward

ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation

Apache-2.0000

Inter-SubNet

The official PyTorch implementation of "Inter-SubNet: Speech Enhancement with Subband Interaction", accepted by ICASSP 2023.

Language:PythonApache-2.0000

langchain-ChatGLM

langchain-ChatGLM, local knowledge based ChatGLM with langchain ｜基于本地知识库的 ChatGLM 问答

Language:VueApache-2.0000

loopy

A data framework for music information retrieval focusing on electronic music.

GPL-3.0000

Mug-Diffusion

High-quality and Controllable Charting AI for Rhythm Games, Modifed from Stable Diffusion

MIT000

NS2VC

Unofficial implementation of NaturalSpeech2 for Voice Conversion

Language:Python000

Personalize-SAM

Personalize Segment Anything Model (SAM) with 1 shot in 10 seconds

MIT000

sinc

Official PyTorch implementation of the paper "SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation"

000

so-vits-svc-1

SoftVC VITS Singing Voice Conversion

BSD-3-Clause000

so-vits-svc-fork

so-vits-svc fork with realtime support, improved interface and more features.

Language:PythonNOASSERTION000

StableSR

Exploiting Diffusion Prior for Real-World Image Super-Resolution

Language:PythonNOASSERTION000

SVT_SpeechBrain

Deep Audio-Visual Singing Voice Transcription based on Self-Supervised Learning Models

Apache-2.0000

symbolic-music-discrete-diffusion

Language:PythonMIT000

tango

Codes and Model of the paper "Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model"

NOASSERTION000

tunesformer

TunesFormer: Forming Tunes with Control Codes

MIT000

vid2avatar

Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition (CVPR2023)

NOASSERTION000

video2midi

youtube synthesia video to midi

GPL-3.0000

Waveformer

An efficient architecture for real-time target sound extraction.

MIT000