MlWoo

MlWoo

Geek Repo

Location:Beijing

Github PK Tool:Github PK Tool

MlWoo's starred repositories

ASR-Benchmarks

An effort to track benchmarking results over widely-used datasets for ASR.

Stargazers:43Issues:0Issues:0

wer_are_we

Attempt at tracking states of the arts and recent results (bibliography) on speech recognition.

Stargazers:1864Issues:0Issues:0

DALLE2-pytorch

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch

Language:PythonLicense:MITStargazers:11025Issues:0Issues:0

versatile_audio_super_resolution

Versatile audio super resolution (any -> 48kHz) with AudioSR.

Language:PythonLicense:MITStargazers:1058Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:341Issues:0Issues:0

tuning_playbook

A playbook for systematically maximizing the performance of deep learning models.

License:NOASSERTIONStargazers:26196Issues:0Issues:0

pyAudioAnalysis

Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

Language:PythonLicense:Apache-2.0Stargazers:5790Issues:0Issues:0

deep-vector-quantization

VQVAEs, GumbelSoftmaxes and friends

Language:Jupyter NotebookLicense:MITStargazers:516Issues:0Issues:0

descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Language:PythonLicense:MITStargazers:1087Issues:0Issues:0

fairseq2

FAIR Sequence Modeling Toolkit 2

Language:PythonLicense:MITStargazers:660Issues:0Issues:0

vall-e

An unofficial PyTorch implementation of the audio LM VALL-E

Language:PythonLicense:MITStargazers:2928Issues:0Issues:0

FlexFlow

FlexFlow Serve: Low-Latency, High-Performance LLM Serving

Language:C++License:Apache-2.0Stargazers:1625Issues:0Issues:0

seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:10690Issues:0Issues:0

RAM-multiprocess-dataloader

Demystify RAM Usage in Multi-Process Data Loaders

Language:PythonLicense:Apache-2.0Stargazers:169Issues:0Issues:0

lhotse

Tools for handling speech data in machine learning projects.

Language:PythonLicense:Apache-2.0Stargazers:921Issues:0Issues:0

Meta-voicebox

Implementation of Meta-Voicebox : The first generative AI model for speech to generalize across tasks with state-of-the-art performance.

License:MITStargazers:548Issues:0Issues:0

conditional-flow-matching

TorchCFM: a Conditional Flow Matching library

Language:PythonLicense:MITStargazers:998Issues:0Issues:0

Pretrained-Language-Model

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

Language:PythonStargazers:3006Issues:0Issues:0

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Language:PythonLicense:Apache-2.0Stargazers:24833Issues:0Issues:0

DL-Art-School

TorToiSe fine-tuning with DLAS

Language:PythonLicense:AGPL-3.0Stargazers:208Issues:0Issues:0

tortoise-tts-fast

Fast TorToiSe inference (5x or your money back!)

Language:Jupyter NotebookLicense:AGPL-3.0Stargazers:768Issues:0Issues:0

d2c

PyTorch implementation of D2C: Diffuison-Decoding Models for Few-shot Conditional Generation.

Language:PythonLicense:MITStargazers:120Issues:0Issues:0

Comprehensive-Transformer-TTS

A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS

Language:PythonLicense:MITStargazers:319Issues:0Issues:0

bark

🔊 Text-Prompted Generative Audio Model

Language:Jupyter NotebookLicense:MITStargazers:35054Issues:0Issues:0

audio-ai-timeline

A timeline of the latest AI models for audio generation, starting in 2023!

Stargazers:1874Issues:0Issues:0

declarativedtw

Reference implementation of DecDTW in PyTorch (ICLR 2023)

Language:Jupyter NotebookLicense:MITStargazers:19Issues:0Issues:0

naturalspeech2-pytorch

Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch

Language:PythonLicense:MITStargazers:1252Issues:0Issues:0

Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Language:PythonLicense:NOASSERTIONStargazers:1807Issues:0Issues:0

stable-diffusion-webui

Stable Diffusion web UI

Language:PythonLicense:AGPL-3.0Stargazers:138530Issues:0Issues:0