auzxb

This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training and extraction of audio embeddings.

Language:PythonMIT000

EnCLAP

Official Implementation of EnCLAP

MIT000

encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Language:PythonNOASSERTION000

facechain

FaceChain is a deep-learning toolchain for generating your Digital-Twin.

Apache-2.0000

HFGI3D

000

icassp2022-vocal-transcription

Code for ICASSP2022 paper "Pseudo-Label Transfer from Frame-Level to Note-Level in a Teacher-Student Framework for Singing Transcription from Polyphonic Music"

000

InternVideo

Video Foundation Models & Data for Multimodal Understanding

Apache-2.0000

lama-cleaner

Image inpainting tool powered by SOTA AI Model. Remove any unwanted object, defect, people from your pictures or erase and replace(powered by stable diffusion) any thing on your pictures.

Apache-2.0000

motion-diffusion-model

The official PyTorch implementation of the paper "Human Motion Diffusion Model"

MIT000

MotionDiffuse

MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model

000

nngen

Apache-2.0000

OPARL

This is the repository of the paper "Online Game Level Generation from Music" in CoG 2022

000

pop2piano

Official Repo of the paper "Pop2Piano : Pop Audio-based Piano Cover Generation"

000

SER-datasets

A collection of datasets for the purpose of emotion recognition/detection in speech.

MIT000

snac

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

MIT000

soundstorm-pytorch

Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch

MIT000

SoundStorm-pytorch-1

Google's SoundStorm: Efficient Parallel Audio Generation

MIT000

SpecVQGAN

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

Language:Jupyter NotebookMIT000

Text-to-sound-Synthesis

The source code of our paper "Diffsound: discrete diffusion model for text-to-sound generation"

Language:Python000

video-bgm-generation

Video Background Music Generation with Controllable Music Transformer (ACM MM 2021 Best Paper Award)

MIT000

wechat-chatgpt

Use ChatGPT On Wechat via wechaty

000