symao (Maoshuiyang)

Maoshuiyang

Geek Repo

Company:The Chinese University of Hong Kong

Location:Hong Kong

Github PK Tool:Github PK Tool

symao's repositories

AcademiCodec

AcademiCodec: An Open Source Audio Codec Model for Academic Research

Language:PythonStargazers:0Issues:0Issues:0

AnimateDiff

Official implementation of AnimateDiff.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

annotated_deep_learning_paper_implementations

🧑‍🏫 60 Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠

Language:Jupyter NotebookLicense:MITStargazers:0Issues:0Issues:0

AudioLDM

AudioLDM: Generate speech, sound effects, music and beyond, with text.

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

Awesome-Video-Diffusion

A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.

Stargazers:0Issues:0Issues:0

bark

🔊 Text-Prompted Generative Audio Model

Language:Jupyter NotebookLicense:MITStargazers:0Issues:0Issues:0

bark-voice-cloning-HuBERT-quantizer

The code for the bark-voicecloning model. Training and inference.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

Bert-VITS2

vits2 backbone with multilingual-bert

Language:PythonLicense:AGPL-3.0Stargazers:0Issues:0Issues:0

Chinese-LLaMA-Alpaca

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

License:Apache-2.0Stargazers:0Issues:0Issues:0

conditional-flow-matching

Conditional Flow Matching: Simulation-Free Dynamic Optimal Transport

License:MITStargazers:0Issues:0Issues:0

EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

it3103

it3103 code repo for students

Stargazers:0Issues:0Issues:0

jukebox

Code for the paper "Jukebox: A Generative Model for Music"

License:NOASSERTIONStargazers:0Issues:0Issues:0

kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

Language:ShellLicense:NOASSERTIONStargazers:0Issues:0Issues:0

Large-Audio-Models

Keep track of big models in audio domain, including speech, singing, music etc.

Stargazers:0Issues:0Issues:0

Latte

Latte: Latent Diffusion Transformer for Video Generation.

License:Apache-2.0Stargazers:0Issues:0Issues:0

leetcode-master

《代码随想录》LeetCode 刷题攻略:200道经典题目刷题顺序,共60w字的详细图解,视频难点剖析,50余张思维导图,支持C++,Java,Python,Go,JavaScript等多语言版本,从此算法学习不再迷茫!🔥🔥 来看看,你会发现相见恨晚!🚀

Stargazers:0Issues:0Issues:0

Lumina-T2X

Lumina-T2X is a unified framework for Text to Any Modality Generation

License:MITStargazers:0Issues:0Issues:0

Matcha-TTS

🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching

License:MITStargazers:0Issues:0Issues:0

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

License:Apache-2.0Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

License:MITStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

StableTTS

Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3

License:MITStargazers:0Issues:0Issues:0

StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

License:MITStargazers:0Issues:0Issues:0

taming-transformers

Taming Transformers for High-Resolution Image Synthesis

License:MITStargazers:0Issues:0Issues:0

TensorRT

NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.

License:Apache-2.0Stargazers:0Issues:0Issues:0

voicebox-pytorch

Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch

License:MITStargazers:0Issues:0Issues:0

wespeaker

Research and Production Oriented Speaker Recognition Toolkit

License:Apache-2.0Stargazers:0Issues:0Issues:0