MlWoo

followers

following

stars

Beijing

MlWoo's starred repositories

llama

Inference code for LLaMA models

Language:PythonNOASSERTION50895 499 872

bark

🔊 Text-Prompted Generative Audio Model

Language:Jupyter NotebookMIT35054 321 430

tuning_playbook

A playbook for systematically maximizing the performance of deep learning models.

NOASSERTION26195 283 40

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Language:PythonApache-2.024829 194 3950

DALLE2-pytorch

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch

Language:PythonMIT11025 120 210

seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Language:Jupyter NotebookNOASSERTION10689 140 343

pyAudioAnalysis

Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

Language:PythonApache-2.05790 210 308

Pretrained-Language-Model

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

Language:Python3005 57 201

vall-e

An unofficial PyTorch implementation of the audio LM VALL-E

Language:PythonMIT2927 89 97

audio-ai-timeline

A timeline of the latest AI models for audio generation, starting in 2023!

wer_are_we

Attempt at tracking states of the arts and recent results (bibliography) on speech recognition.

FlexFlow

FlexFlow Serve: Low-Latency, High-Performance LLM Serving

Language:C++Apache-2.01625 33 632

descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Language:PythonMIT1086 26 72

versatile_audio_super_resolution

Versatile audio super resolution (any -> 48kHz) with AudioSR.

Language:PythonMIT1058 24 53

conditional-flow-matching

TorchCFM: a Conditional Flow Matching library

Language:PythonMIT997 14 47

lhotse

Tools for handling speech data in machine learning projects.

Language:PythonApache-2.0921 44 407

tortoise-tts-fast

Fast TorToiSe inference (5x or your money back!)

Language:Jupyter NotebookAGPL-3.0768 27 125

fairseq2

FAIR Sequence Modeling Toolkit 2

Language:PythonMIT660 18 98

Meta-voicebox

Implementation of Meta-Voicebox : The first generative AI model for speech to generalize across tasks with state-of-the-art performance.

deep-vector-quantization

VQVAEs, GumbelSoftmaxes and friends

Language:Jupyter NotebookMIT516 14 7

Awesome-LLM-System-Papers

SpeechTokenizer

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Language:PythonApache-2.0400 15 11

speechbox

Language:PythonApache-2.0341 16 25

Comprehensive-Transformer-TTS

A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS

Language:PythonMIT319 13 20

DL-Art-School

TorToiSe fine-tuning with DLAS

Language:PythonAGPL-3.0208 15 62

RAM-multiprocess-dataloader

Demystify RAM Usage in Multi-Process Data Loaders

Language:PythonApache-2.0169 8 9

USLM

Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)

Language:Python124 8 4

d2c

PyTorch implementation of D2C: Diffuison-Decoding Models for Few-shot Conditional Generation.

Language:PythonMIT120 4 6

ASR-Benchmarks

An effort to track benchmarking results over widely-used datasets for ASR.

DistSup

Representation learning for NLP @ JSALT19

Language:PythonApache-2.034 7 1