Beast code in Giters

xjia520's repositories

FFmpeg

Mirror of git://source.ffmpeg.org/ffmpeg.git

Language:CNOASSERTION1 10

The purpose of this code base is to add a specified signal-to-noise ratio noise from MUSAN dataset to a pure speech signal and to generate far-field speech data using room impulse response data from BUT Speech@FIT Reverb Database.

Language:PythonMIT000

AEC-Challenge

AEC Challenge

MIT010

athena-signal

Language:CApache-2.0010

audacity

Audio Editor

Language:CNOASSERTION010

audioFlux

A library for audio and music analysis, feature extraction.

Language:CMIT000

bark

🔊 Text-prompted Generative Audio Model

Language:PythonNOASSERTION000

Basic_CNNs_TensorFlow2

A tensorflow2 implementation of some basic CNNs(MobileNetV1/V2/V3, EfficientNet, ResNeXt, InceptionV4, InceptionResNetV1/V2, SENet, SqueezeNet, DenseNet, ShuffleNetV2, ResNet).

MIT000

book

Deep Learning 101 with PaddlePaddle （『飞桨』深度学习框架入门教程）

Language:HTML010

DeepComplexCRN

Apache-2.0000

DeepSpeech

DeepSpeech is an open source speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

MPL-2.0000

detectron2

Detectron2 is FAIR's next-generation research platform for object detection and segmentation.

Apache-2.0000

kaldi

This is the official location of the Kaldi project.

Language:ShellNOASSERTION010

magenta

Magenta: Music and Art Generation with Machine Intelligence

Language:PythonApache-2.0010

MockingBird

🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time

NOASSERTION000

node-addon-examples

Node.js C++ addon examples from http://nodejs.org/docs/latest/api/addons.html

000

pytorch-StarGAN-VC

Fully reproduce the paper of StarGAN-VC. Stable training and Better audio quality .

Language:Python000

Real-Time-Voice-Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Language:PythonNOASSERTION010

so-vits-svc

SoftVC VITS Singing Voice Conversion

AGPL-3.0000

source_separation

Deep learning based speech source separation using Pytorch

Apache-2.0000

Speech-enhancement

Deep learning for audio denoising

Language:PythonMIT010

Speech-enhancement-1

Deep neural network based speech enhancement toolkit

Language:MATLABGPL-2.0010

Speech_Signal_Processing_and_Classification

Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].

Language:Python010

xjia520

xjia520's repositories

FFmpeg

Add_noise_and_rir_to_speech

AEC-Challenge

athena-signal

audacity

audioFlux

bark

Basic_CNNs_TensorFlow2

book

DeepComplexCRN

DeepSpeech

detectron2

kaldi

magenta

MockingBird

node-addon-examples

pytorch-StarGAN-VC

Real-Time-Voice-Cloning

so-vits-svc

source_separation

Speech-enhancement

Speech-enhancement-1

Speech_Signal_Processing_and_Classification

StarGAN-Voice-Conversion

tensorflow

UGATIT

Voice-based-gender-recognition

voice-changer

xjia520.github.io

Zero_Shot_Audio_Source_Separation