Zhikang Niu's repositories
encodec-pytorch
unofficial implementation of the High Fidelity Neural Audio Compression
AI-research-tools
:hammer:AI 方向好用的科研工具
pre-train-dockerfile
An Intro to set up your Speech Docker environment and debug using VSCode
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
Awesome-VQVAE
📚 A collection of resources and papers on Vector Quantized Variational Autoencoder (VQ-VAE) and its application
CMG
The official implementation of Achieving Cross Modal Generalization with Multimodal Unified Representation (NeurIPS '23)
descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
descript-audio-vae
VAE GAN modified from Descript Audio Codec, which replaces the RVQ with VAE
diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
llama
Inference code for LLaMA models
llama-recipes
Examples and recipes for Llama 2 model
M2UGen
This is the official repository for M2UGen
ollama
Get up and running with Llama 3, Mistral, Gemma, and other large language models.
Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
parler-tts
Inference and training library for high-quality TTS models.
SLAM-LLM
Speech, Language, Audio, Music Processing with Large Language Model
snac
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
tango
Codes and Model of the paper "Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model"
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
VAR
[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction"
VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild