ColinSnow's starred repositories

LivePortrait

Bring portraits to life!

Language:PythonLicense:NOASSERTIONStargazers:12011Issues:0Issues:0

Kolors

Kolors Team

Language:PythonLicense:Apache-2.0Stargazers:3633Issues:0Issues:0

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Language:PythonLicense:MITStargazers:30213Issues:0Issues:0

MARS5-TTS

MARS5 speech model (TTS) from CAMB.AI

Language:Jupyter NotebookLicense:AGPL-3.0Stargazers:2470Issues:0Issues:0

champ

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

Language:PythonLicense:MITStargazers:4179Issues:0Issues:0

MoneyPrinterTurbo

利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.

Language:PythonLicense:MITStargazers:16289Issues:0Issues:0

tts-generation-webui

TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS)

Language:TypeScriptLicense:MITStargazers:1674Issues:0Issues:0

ZMM-TTS

ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations

Language:CLicense:BSD-3-ClauseStargazers:115Issues:0Issues:0

CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

Language:PythonLicense:Apache-2.0Stargazers:5909Issues:0Issues:0

audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Language:PythonLicense:MITStargazers:20685Issues:0Issues:0

VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:7506Issues:0Issues:0

VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

Language:PythonLicense:MITStargazers:7575Issues:0Issues:0

vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Language:PythonLicense:Apache-2.0Stargazers:2009Issues:0Issues:0

vall-e

An unofficial PyTorch implementation of the audio LM VALL-E

Language:PythonLicense:MITStargazers:2939Issues:0Issues:0

paper-reading

深度学习经典、新论文逐段精读

License:Apache-2.0Stargazers:5Issues:0Issues:0

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonLicense:MITStargazers:4484Issues:0Issues:0

Bark-Voice-Cloning

Bark Voice Cloning and Voice Cloning for Chinese Speech

Language:Jupyter NotebookLicense:MITStargazers:2736Issues:0Issues:0

MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.

Language:PythonLicense:MITStargazers:4463Issues:0Issues:0

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Language:PythonLicense:MITStargazers:33170Issues:0Issues:0

emotion2vec

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Language:PythonStargazers:586Issues:0Issues:0

metavoice-src

Foundational model for human-like, expressive TTS

Language:PythonLicense:Apache-2.0Stargazers:3759Issues:0Issues:0

StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Language:PythonLicense:MITStargazers:4771Issues:0Issues:0
Language:PythonLicense:MITStargazers:8Issues:0Issues:0

pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Language:Jupyter NotebookLicense:MITStargazers:5980Issues:0Issues:0

SubFix

SubFix: Efficient Web-Based Audio Subtitle Editing and Multilingual Automatic Annotation Tool.

Language:PythonLicense:Apache-2.0Stargazers:187Issues:0Issues:0

TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Language:PythonLicense:MPL-2.0Stargazers:33827Issues:0Issues:0

Bert-VITS2

vits2 backbone with multilingual-bert

Language:PythonLicense:AGPL-3.0Stargazers:7834Issues:0Issues:0

AnyText

Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>

Language:PythonLicense:Apache-2.0Stargazers:4242Issues:0Issues:0

EasyBertVits2

文章から感情豊かな音声を生成する Bert-VITS2 を簡単に使えます。

Language:BatchfileLicense:MITStargazers:136Issues:0Issues:0

fish-speech

Brand new TTS solution

Language:PythonLicense:NOASSERTIONStargazers:12629Issues:0Issues:0