xjia520's repositories

FFmpeg

Mirror of git://source.ffmpeg.org/ffmpeg.git

Language:CLicense:NOASSERTIONStargazers:1Issues:1Issues:0

Add_noise_and_rir_to_speech

The purpose of this code base is to add a specified signal-to-noise ratio noise from MUSAN dataset to a pure speech signal and to generate far-field speech data using room impulse response data from BUT Speech@FIT Reverb Database.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

AEC-Challenge

AEC Challenge

License:MITStargazers:0Issues:1Issues:0
Language:CLicense:Apache-2.0Stargazers:0Issues:1Issues:0

audacity

Audio Editor

Language:CLicense:NOASSERTIONStargazers:0Issues:1Issues:0

audioFlux

A library for audio and music analysis, feature extraction.

Language:CLicense:MITStargazers:0Issues:0Issues:0

bark

🔊 Text-prompted Generative Audio Model

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

Basic_CNNs_TensorFlow2

A tensorflow2 implementation of some basic CNNs(MobileNetV1/V2/V3, EfficientNet, ResNeXt, InceptionV4, InceptionResNetV1/V2, SENet, SqueezeNet, DenseNet, ShuffleNetV2, ResNet).

License:MITStargazers:0Issues:0Issues:0

book

Deep Learning 101 with PaddlePaddle (『飞桨』深度学习框架入门教程)

Language:HTMLStargazers:0Issues:1Issues:0
License:Apache-2.0Stargazers:0Issues:0Issues:0

DeepSpeech

DeepSpeech is an open source speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

License:MPL-2.0Stargazers:0Issues:0Issues:0

detectron2

Detectron2 is FAIR's next-generation research platform for object detection and segmentation.

License:Apache-2.0Stargazers:0Issues:0Issues:0

kaldi

This is the official location of the Kaldi project.

Language:ShellLicense:NOASSERTIONStargazers:0Issues:1Issues:0

magenta

Magenta: Music and Art Generation with Machine Intelligence

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

MockingBird

🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time

License:NOASSERTIONStargazers:0Issues:0Issues:0

node-addon-examples

Node.js C++ addon examples from http://nodejs.org/docs/latest/api/addons.html

Stargazers:0Issues:0Issues:0

pytorch-StarGAN-VC

Fully reproduce the paper of StarGAN-VC. Stable training and Better audio quality .

Language:PythonStargazers:0Issues:0Issues:0

Real-Time-Voice-Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Language:PythonLicense:NOASSERTIONStargazers:0Issues:1Issues:0

so-vits-svc

SoftVC VITS Singing Voice Conversion

License:AGPL-3.0Stargazers:0Issues:0Issues:0

source_separation

Deep learning based speech source separation using Pytorch

License:Apache-2.0Stargazers:0Issues:0Issues:0

Speech-enhancement

Deep learning for audio denoising

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

Speech-enhancement-1

Deep neural network based speech enhancement toolkit

Language:MATLABLicense:GPL-2.0Stargazers:0Issues:1Issues:0

Speech_Signal_Processing_and_Classification

Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].

Language:PythonStargazers:0Issues:1Issues:0

StarGAN-Voice-Conversion

full tensorflow implementation of the paper: StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks https://arxiv.org/abs/1806.02169

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

tensorflow

An Open Source Machine Learning Framework for Everyone

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

UGATIT

Official Tensorflow implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation (ICLR 2020)

License:MITStargazers:0Issues:0Issues:0

Voice-based-gender-recognition

Voice based gender recognition using Mel-frequency cepstrum coefficients (MFCC) and Gaussian mixture models (GMM)

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

voice-changer

リアルタイムボイスチェンジャー Realtime Voice Changer

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0
Language:HTMLStargazers:0Issues:0Issues:0

Zero_Shot_Audio_Source_Separation

The official code repo for "Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data", in AAAI 2022

License:MITStargazers:0Issues:0Issues:0