ZhikangNiu

Zhikang Niu's repositories

encodec-pytorch

unofficial implementation of the High Fidelity Neural Audio Compression

Language:PythonMIT116 4 19

pre-train-dockerfile

An Intro to set up your Speech Docker environment and debug using VSCode

Language:Dockerfile200

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonMIT000

audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

MIT000

Awesome-VQVAE

📚 A collection of resources and papers on Vector Quantized Variational Autoencoder (VQ-VAE) and its application

MIT000

CMG

The official implementation of Achieving Cross Modal Generalization with Multimodal Unified Representation (NeurIPS '23)

000

dataspeech

Language:PythonMIT000

descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Language:PythonMIT000

descript-audio-vae

VAE GAN modified from Descript Audio Codec, which replaces the RVQ with VAE

MIT000

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Language:PythonApache-2.0000

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Language:PythonMIT000

hilcodec

000

icefall

Apache-2.0000

llama

Inference code for LLaMA models

Language:PythonNOASSERTION000

llama-recipes

Examples and recipes for Llama 2 model

Language:Jupyter NotebookNOASSERTION000

M2UGen

This is the official repository for M2UGen

Language:Jupyter NotebookMIT000

minimal-nlp

Language:Python010

ollama

Get up and running with Llama 3, Mistral, Gemma, and other large language models.

Language:GoMIT000

Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Language:PythonMIT000

parler-tts

Inference and training library for high-quality TTS models.

Language:PythonApache-2.0000

SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

Language:Python000

snac

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Language:PythonMIT000

tango

Codes and Model of the paper "Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model"

Language:PythonNOASSERTION000

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonApache-2.0000

VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction"

Language:PythonMIT000

VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Language:Jupyter NotebookNOASSERTION000

ZhikangNiu.github.io

Language:HTML000

ZhikangNiu

Zhikang Niu's repositories

encodec-pytorch

AI-research-tools

ZhikangNiu

pre-train-dockerfile

download_dataset_scripts

Amphion

audiocraft

Awesome-VQVAE

CMG

dataspeech

descript-audio-codec

descript-audio-vae

diffusers

fairseq

hilcodec

icefall

llama

llama-recipes

M2UGen

minimal-nlp

ollama

Open-Sora-Plan

parler-tts

SLAM-LLM

snac

tango

transformers

VAR

VoiceCraft

ZhikangNiu.github.io