entn-at

Ewald Enzinger's repositories

faster-whisper

Faster Whisper transcription with CTranslate2

Language:PythonMIT100

lyra

A Very Low-Bitrate Codec for Speech Compression

Language:C++Apache-2.01 20

AudioDec

An Open-source Streaming High-fidelity Neural Audio Codec

Language:PythonNOASSERTION000

audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch

Language:PythonMIT000

Auto_Tuning_Zeroshot_TTS_and_VC

Official implementation of "Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis", Interspeech 2023

Language:PythonMIT000

BetaVAE_VC

Implementation for paper "Disentangled Speech Representation Learning for One-Shot Cross-Lingual Voice Conversion Using ß-VAE"

Language:PythonMIT000

D-TDNN

PyTorch implementation of Densely Connected Time Delay Neural Network

000

divide_lm

000

DPHuBERT

INTERSPEECH 2023: "DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models"

Language:PythonMIT000

encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

MIT000

fad_pytorch

Frechet Audio Distance evaluation in PyTorch

Language:PythonMIT000

fstalign

An efficient OpenFST-based tool for calculating WER and aligning two transcript sequences.

Apache-2.0000

lhotse

Language:PythonApache-2.0010

oobleck

open soundstream-ish VAE codecs for downstream neural audio synthesis

MIT000

pyctcdecode

A fast and lightweight python-based CTC beam search decoder for speech recognition.

Language:PythonApache-2.0010

riva-asrlib-decoder

Standalone implementation of the CUDA-accelerated WFST Decoder available in Riva

Language:Python000

SAT

Streaming Audiotransformers for online Audio tagging

Language:PythonGPL-3.0000

sequence_align

Efficient implementations of Needleman-Wunsch and other sequence alignment algorithms written in Rust with Python bindings via PyO3.

Apache-2.0000

SnakeGAN

Please visit https://thuhcsi.github.io/SnakeGAN/

CC0-1.0000

so-vits-svc-fork

so-vits-svc fork with realtime support, improved interface and more features.

Language:PythonNOASSERTION000

soundstorm-pytorch

Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch

Language:PythonMIT000

SV_eval_protocols_for_SD

Speaker verification evaluation protocols simulating speaker diarisation

MIT000

tango

Codes and Model of the paper "Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model"

Language:PythonNOASSERTION000

TTS-Cube

End-2-end speech synthesis with recurrent neural networks

Language:PythonApache-2.0030

vocal-tract-grad

000

Waveformer

An efficient architecture for real-time target sound extraction.

Language:PythonMIT000

whisper-finetuning

[WIP] Scripts for fine-tuning Whisper

Language:PythonMIT000

whisper-jax

Language:Jupyter NotebookApache-2.0000

whisper-punctuator

Zero-shot Punctuation Insertion using Whisper

Language:PythonMIT000

zm-text-tts

Learning to Speak from Text for Low-Resource TTS

Language:PythonApache-2.0000