zhongshijun's repositories
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
AudioSep
Official implementation of "Separate Anything You Describe"
auto-assess-rhythm-imitation
Code for automatic assessment of rhythmic pattern imitations
CodeTalker
[CVPR 2023] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
CoMoSVC
CoMoSVC: One-Step Consistency Model Based Singing Voice Conversion & Singing Voice Clone
ComputeLibrary
The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
conformer
PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)
crepe
CREPE: A Convolutional REpresentation for Pitch Estimation -- pre-trained model (ICASSP 2018)
CRUSE
TOWARDS EFFICIENT MODELS FOR REAL-TIME DEEP NOISE SUPPRESSION
DALL-E
PyTorch package for the discrete VAE used for DALL·E.
deepvqe
An unofficial implementation of DeepVQE proposed by Microsoft Corp.
DiffPitcher
Diffusion-based singing voice pitch correction
Diffusion-Models-Papers-Survey-Taxonomy
Diffusion model papers, survey, and taxonomy
e2e_dnn_ad_control_for_lin_aec
End-To-End Deep Learning-based Adaptation Control for Linear Acoustic Echo Cancellation
easyeffects
Limiter, compressor, convolver, equalizer and auto volume and many other plugins for PipeWire applications
EAT_code
Official code for ICCV 2023 paper: "Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation".
gtcrn
An official implementation of GTCRN, an ultra-lite speech enhancement model.
hello-world
Is my first repository.
ml-spatial-librispeech
A large synthetic dataset of spatial audio with multiple labels
motion-diffusion-model
The official PyTorch implementation of the paper "Human Motion Diffusion Model"
Motion-X
Official implementation of the paper "Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset"
NeuralSVB
Learning the Beauty in Songs: Neural Singing Voice Beautifier; ACL 2022 (Main conference); Official code
RUI_SE
The official repo of "A Refining Underlying Information Framework for Speech Enhancement"
sgmse
Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation
so-vits-svc
SoftVC VITS Singing Voice Conversion
video-retalking
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
Wav2Lip
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020.
webrtcperf
WebRTC performance and quality evaluation tool.