Maoshuiyang

symao's repositories

AcademiCodec

AcademiCodec: An Open Source Audio Codec Model for Academic Research

Language:Python000

AnimateDiff

Official implementation of AnimateDiff.

Language:PythonApache-2.0000

annotated_deep_learning_paper_implementations

🧑‍🏫 60 Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠

Language:Jupyter NotebookMIT000

AudioLDM

AudioLDM: Generate speech, sound effects, music and beyond, with text.

Language:PythonNOASSERTION000

Awesome-Video-Diffusion

A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.

000

bark

🔊 Text-Prompted Generative Audio Model

Language:Jupyter NotebookMIT000

bark-voice-cloning-HuBERT-quantizer

The code for the bark-voicecloning model. Training and inference.

Language:PythonMIT000

Bert-VITS2

vits2 backbone with multilingual-bert

Language:PythonAGPL-3.0000

Chinese-LLaMA-Alpaca

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

Apache-2.0000

conditional-flow-matching

Conditional Flow Matching: Simulation-Free Dynamic Optimal Transport

MIT000

EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Language:PythonApache-2.0000

it3103

it3103 code repo for students

000

jukebox

Code for the paper "Jukebox: A Generative Model for Music"

NOASSERTION000

kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

Language:ShellNOASSERTION000

Large-Audio-Models

Keep track of big models in audio domain, including speech, singing, music etc.

000

Latte

Latte: Latent Diffusion Transformer for Video Generation.

Apache-2.0000

leetcode-master

《代码随想录》LeetCode 刷题攻略：200道经典题目刷题顺序，共60w字的详细图解，视频难点剖析，50余张思维导图，支持C++，Java，Python，Go，JavaScript等多语言版本，从此算法学习不再迷茫！🔥🔥 来看看，你会发现相见恨晚！🚀

000

Lumina-T2X

Lumina-T2X is a unified framework for Text to Any Modality Generation

MIT000

Matcha-TTS

🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching

MIT000

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Apache-2.0000

open-tts-tracker

000

pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

MIT000

SECap

000

seed-tts-eval

000

StableTTS

Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3

MIT000

StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

MIT000

taming-transformers

Taming Transformers for High-Resolution Image Synthesis

MIT000

TensorRT

NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.

Apache-2.0000

voicebox-pytorch

Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch

MIT000

wespeaker

Research and Production Oriented Speaker Recognition Toolkit

Apache-2.0000