taalua's repositories
SSL_Anti-spoofing
This repository includes the code to reproduce our paper "Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation".
voicesmith
[WIP] VoiceSmith makes training text to speech models easy.
Catch-A-Waveform
Official pytorch implementation of the paper: "Catch-A-Waveform: Learning to Generate Audio from a Single Short Example" (NeurIPS 2021)
FastSpeech2
Multi-Speaker Pytorch FastSpeech2: Fast and High-Quality End-to-End Text to Speech :fist:
jax-variational-diffwave
Jax/Flax implementation of Variational-DiffWave.
taalua
Config files for my GitHub profile.
g2p
g2p: English Grapheme To Phoneme Conversion
Neural-HMM
Neural HMMs are all you need (for high-quality attention-free TTS)
normalizing-flows
PyTorch implementation of normalizing flow models
clpcnet
Pitch-shifting, time-stretching, and vocoding of speech with Controllable LPCNet (CLPCNet)
few-shot-transformer-tts
Byte-based multilingual transformer TTS for low-resource/few-shot language adaptation.
mir-svc
Unsupervised WaveNet-based Singing Voice Conversion Using Pitch Augmentation and Two-phase Approach
ParallelWaveGAN
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch
flowEQ
β-VAE for intelligent control of a five band parametric EQ
FG-transformer-TTS
Official implementation for the paper Fine-grained style control in transformer-based text-to-speech synthesis.
WaveGrad
Implementation of Google Brain's WaveGrad vocoder (paper: https://arxiv.org/pdf/2009.00713.pdf). First implementation on GitHub.
editts
Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech
MTL-Speaker-Embeddings
Code for the paper: "Leveraging speaker attribute information using multi task learning for speaker verification and diarization" presented at Interspeech 2021
inaSpeechSegmenter
CNN-based audio segmentation toolkit. Allows to detect speech, music and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.
tt-vae-gan
Timbre transfer with variational autoencoding and cycle-consistent adversarial networks. Able to transfer the timbre of an audio source to that of another.
stereoEEG2speech
Code for a seq2seq architecture with Bahdanau attention designed to map stereotactic EEG data from human brains to spectrograms, using the PyTorch Lightning.
ssqueezepy
Synchrosqueezing, wavelet transforms, and time-frequency analysis in Python
flow_synthesizer
Universal audio synthesizer control learning with normalizing flows
wavencoder
WavEncoder is a Python library for encoding audio signals, transforms for audio augmentation, and training audio classification models with PyTorch backend.
msaf
Music Structure Analysis Framework
MaskCycleGAN-VC
Implementation of Kaneko et al.'s MaskCycleGAN-VC model for non-parallel voice conversion.
mixture-of-experts
PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538
AudioStyleNet
This repository contains the code for my master thesis on Emotion-Aware Facial Animation