Colinsnow1

followers

following

stars

ColinSnow's starred repositories

LivePortrait

Bring portraits to life!

Language:PythonNOASSERTION1201100

Kolors

Kolors Team

Language:PythonApache-2.0363300

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Language:PythonMIT3021300

MARS5-TTS

MARS5 speech model (TTS) from CAMB.AI

Language:Jupyter NotebookAGPL-3.0247000

champ

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

Language:PythonMIT417900

MoneyPrinterTurbo

利用AI大模型，一键生成高清短视频 Generate short videos with one click using AI LLM.

Language:PythonMIT1628900

tts-generation-webui

TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS)

Language:TypeScriptMIT167400

ZMM-TTS

ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations

Language:CBSD-3-Clause11500

CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

Language:PythonApache-2.0590900

audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Language:PythonMIT2068500

VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Language:Jupyter NotebookNOASSERTION750600

VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

Language:PythonMIT757500

vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Language:PythonApache-2.0200900

vall-e

An unofficial PyTorch implementation of the audio LM VALL-E

Language:PythonMIT293900

paper-reading

深度学习经典、新论文逐段精读

Apache-2.0500

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonMIT448400

Bark-Voice-Cloning

Bark Voice Cloning and Voice Cloning for Chinese Speech

Language:Jupyter NotebookMIT273600

MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.

Language:PythonMIT446300

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Language:PythonMIT3317000

emotion2vec

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Language:Python58600

metavoice-src

Foundational model for human-like, expressive TTS

Language:PythonApache-2.0375900

StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Language:PythonMIT477100

vits_korean_multispeaker

Language:PythonMIT800

pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Language:Jupyter NotebookMIT598000

SubFix

SubFix: Efficient Web-Based Audio Subtitle Editing and Multilingual Automatic Annotation Tool.

Language:PythonApache-2.018700

TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Language:PythonMPL-2.03382700

Bert-VITS2

vits2 backbone with multilingual-bert

Language:PythonAGPL-3.0783400

AnyText

Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>

Language:PythonApache-2.0424200

EasyBertVits2

文章から感情豊かな音声を生成する Bert-VITS2 を簡単に使えます。

Language:BatchfileMIT13600

fish-speech

Brand new TTS solution

Language:PythonNOASSERTION1262900