auzxb's repositories
InternVideo
Video Foundation Models & Data for Multimodal Understanding
snac
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
EnCLAP
Official Implementation of EnCLAP
video-bgm-generation
Video Background Music Generation with Controllable Music Transformer (ACM MM 2021 Best Paper Award)
facechain
FaceChain is a deep-learning toolchain for generating your Digital-Twin.
soundstorm-pytorch
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
SoundStorm-pytorch-1
Google's SoundStorm: Efficient Parallel Audio Generation
actionformer_release
Code release for ActionFormer (ECCV 2022)
EfficientAT
This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training and extraction of audio embeddings.
Text-to-sound-Synthesis
The source code of our paper "Diffsound: discrete diffusion model for text-to-sound generation"
CLAP
Contrastive Language-Audio Pretraining
SpecVQGAN
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)
encodec
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
Awesome-Diffusion-Models
A collection of resources and papers on Diffusion Models
BigVGAN-1
Official PyTorch implementation of BigVGAN (ICLR 2023)
AudioCaption
Audio captioning recipe
lama-cleaner
Image inpainting tool powered by SOTA AI Model. Remove any unwanted object, defect, people from your pictures or erase and replace(powered by stable diffusion) any thing on your pictures.
wechat-chatgpt
Use ChatGPT On Wechat via wechaty
pop2piano
Official Repo of the paper "Pop2Piano : Pop Audio-based Piano Cover Generation"
motion-diffusion-model
The official PyTorch implementation of the paper "Human Motion Diffusion Model"
MotionDiffuse
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model
OPARL
This is the repository of the paper "Online Game Level Generation from Music" in CoG 2022
BigVGAN
Unofficial pytorch implementation of BigVGAN: A Universal Neural Vocoder with Large-Scale Training
SER-datasets
A collection of datasets for the purpose of emotion recognition/detection in speech.
icassp2022-vocal-transcription
Code for ICASSP2022 paper "Pseudo-Label Transfer from Frame-Level to Note-Level in a Teacher-Student Framework for Singing Transcription from Polyphonic Music"
chatbot-list
行业内关于智能客服、聊天机器人的应用和架构、算法分享和介绍