MlWoo

followers

following

stars

Beijing

MlWoo's starred repositories

ASR-Benchmarks

An effort to track benchmarking results over widely-used datasets for ASR.

4300

wer_are_we

Attempt at tracking states of the arts and recent results (bibliography) on speech recognition.

DALLE2-pytorch

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch

Language:PythonMIT1102500

versatile_audio_super_resolution

Versatile audio super resolution (any -> 48kHz) with AudioSR.

Language:PythonMIT105800

speechbox

Language:PythonApache-2.034100

tuning_playbook

A playbook for systematically maximizing the performance of deep learning models.

NOASSERTION2619600

pyAudioAnalysis

Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

Language:PythonApache-2.0579000

deep-vector-quantization

VQVAEs, GumbelSoftmaxes and friends

Language:Jupyter NotebookMIT51600

descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Language:PythonMIT108700

fairseq2

FAIR Sequence Modeling Toolkit 2

Language:PythonMIT66000

vall-e

An unofficial PyTorch implementation of the audio LM VALL-E

Language:PythonMIT292800

FlexFlow

FlexFlow Serve: Low-Latency, High-Performance LLM Serving

Language:C++Apache-2.0162500

Awesome-LLM-System-Papers

seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Language:Jupyter NotebookNOASSERTION1069000

RAM-multiprocess-dataloader

Demystify RAM Usage in Multi-Process Data Loaders

Language:PythonApache-2.016900

lhotse

Tools for handling speech data in machine learning projects.

Language:PythonApache-2.092100

Meta-voicebox

Implementation of Meta-Voicebox : The first generative AI model for speech to generalize across tasks with state-of-the-art performance.

MIT54800

conditional-flow-matching

TorchCFM: a Conditional Flow Matching library

Language:PythonMIT99800

Pretrained-Language-Model

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

Language:Python300600

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Language:PythonApache-2.02483300

DL-Art-School

TorToiSe fine-tuning with DLAS

Language:PythonAGPL-3.020800

tortoise-tts-fast

Fast TorToiSe inference (5x or your money back!)

Language:Jupyter NotebookAGPL-3.076800

d2c

PyTorch implementation of D2C: Diffuison-Decoding Models for Few-shot Conditional Generation.

Language:PythonMIT12000

Comprehensive-Transformer-TTS

A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS

Language:PythonMIT31900

bark

🔊 Text-Prompted Generative Audio Model

Language:Jupyter NotebookMIT3505400

audio-ai-timeline

A timeline of the latest AI models for audio generation, starting in 2023!

declarativedtw

Reference implementation of DecDTW in PyTorch (ICLR 2023)

Language:Jupyter NotebookMIT1900

naturalspeech2-pytorch

Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch

Language:PythonMIT125200

Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Language:PythonNOASSERTION180700

stable-diffusion-webui

Stable Diffusion web UI

Language:PythonAGPL-3.013853000