Vlad Bataev's starred repositories
DeepLearningExamples
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
TensorFlowTTS
:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
awesome-git
A curated list of amazingly awesome Git tools, resources and shiny things
libcudacxx
[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl
denoiser
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
TurboTransformers
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
ffmpeg-normalize
Audio Normalization for Python/ffmpeg
svoice
We provide a PyTorch implementation of the paper Voice Separation with an Unknown Number of Multiple Speakers In which, we present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and the model with the largest number of speakers is employed to select the actual number of speakers in a given sample. Our method greatly outperforms the current state of the art, which, as we show, is not competitive for more than two speakers.
speech-synthesis-paper
List of speech synthesis papers.
setuptools-rust
Setuptools plugin for Rust support
Thorsten-Voice
Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.
Prosodylab-Aligner
Python interface for forced audio alignment using HTK and SoX
tacotron2-vae
Implementation of "Learning Latent Representations for Style Control and Transfer in End-to-end Speech Synthesis"
vcc20_baseline_cyclevae
Voice Conversion Challenge 2020 CycleVAE baseline system
emotiontts_open_db
λ‘λ΄μ κ°μ λ° κ°μ±μ ννν μ μλ λνν μμ±ν©μ± μ€νμμ€ νλ«νΌ
Intelligibility-MetricGAN
Implementation for paper "iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning"
soxbindings
Python bindings for SoX, aiming to replicate a subset of the command line sox utility.
catboost-go
Catboost Go Wrapper