vladbataev

Vlad Bataev's starred repositories

DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

Language:Jupyter Notebook13160 300 833

cleanlab

The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

Language:PythonAGPL-3.09334 89 362

demucs

Code for the paper Hybrid Spectrogram and Waveform Source Separation

Language:PythonMIT8085 152 539

gpustat

📊 A simple command-line utility for querying and monitoring GPU status

Language:PythonMIT3994 45 121

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

Language:PythonApache-2.03792 78 684

kenlm

KenLM: Faster and Smaller Language Model Queries

Language:C++NOASSERTION2473 70 368

awesome-git

A curated list of amazingly awesome Git tools, resources and shiny things

CC0-1.02409 52 10

libcudacxx

[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl

Language:C++NOASSERTION2294 68 94

hifi-gan

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Language:PythonMIT1875 32 162

denoiser

Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.

Language:PythonNOASSERTION1632 38 149

TurboTransformers

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

Language:C++NOASSERTION1464 41 118

ffmpeg-normalize

Audio Normalization for Python/ffmpeg

Language:PythonMIT1232 28 208

svoice

We provide a PyTorch implementation of the paper Voice Separation with an Unknown Number of Multiple Speakers In which, we present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and the model with the largest number of speakers is employed to select the actual number of speakers in a given sample. Our method greatly outperforms the current state of the art, which, as we show, is not competitive for more than two speakers.

Language:PythonNOASSERTION1210 25 89