我的AI世界 (jin1258804025)

jin1258804025

Geek Repo

Github PK Tool:Github PK Tool

我的AI世界's repositories

3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

Bert-VITS2

vits2 backbone with multilingual-bert

License:AGPL-3.0Stargazers:0Issues:0Issues:0
License:NOASSERTIONStargazers:0Issues:0Issues:0

EasyBertVits2

文章から感情豊かな音声を生成する Bert-VITS2 を簡単に使えます。

Language:BatchfileLicense:MITStargazers:0Issues:0Issues:0

espeak-phonemizer

Uses ctypes and libespeak-ng to transform test into IPA phonemes

License:GPL-3.0Stargazers:0Issues:0Issues:0

fish-speech

Brand new TTS solution

Language:PythonLicense:BSD-3-ClauseStargazers:0Issues:0Issues:0

FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models. |语音识别工具包,包含丰富的性能优越的开源预训练模型,支持语音识别、语音端点检测、文本后处理等,具备服务部署能力。

License:NOASSERTIONStargazers:0Issues:0Issues:0

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Language:PythonLicense:MITStargazers:0Issues:0Issues:0
License:MITStargazers:0Issues:0Issues:0

HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

License:NOASSERTIONStargazers:0Issues:0Issues:0

leedl-tutorial

《李宏毅深度学习教程》,PDF下载地址:https://github.com/datawhalechina/leedl-tutorial/releases

License:NOASSERTIONStargazers:0Issues:0Issues:0

MARS5-TTS

MARS5 speech model (TTS) from CAMB.AI

License:AGPL-3.0Stargazers:0Issues:0Issues:0

MassTTS

a TTS demo for training new characters.

License:Apache-2.0Stargazers:0Issues:0Issues:0

megatts2

Unoffical implement of Megatts2

License:MITStargazers:0Issues:0Issues:0
License:Apache-2.0Stargazers:0Issues:0Issues:0

PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

License:Apache-2.0Stargazers:0Issues:0Issues:0

parler-tts

Inference and training library for high-quality TTS models.

License:Apache-2.0Stargazers:0Issues:0Issues:0

sherpa-onnx

Speech-to-text and text-to-speech using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

spear-tts-pytorch

Implementation of Spear-TTS - multi-speaker text-to-speech attention network, in Pytorch

License:MITStargazers:0Issues:0Issues:0

StyleTTS

Official Implementation of StyleTTS

License:MITStargazers:0Issues:0Issues:0

StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

License:MITStargazers:0Issues:0Issues:0

tensorflow-wavenet

A TensorFlow implementation of DeepMind's WaveNet paper

License:MITStargazers:0Issues:0Issues:0

VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io

License:MITStargazers:0Issues:0Issues:0

vall-e_

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

License:Apache-2.0Stargazers:0Issues:0Issues:0

VALOR

Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

License:MITStargazers:0Issues:0Issues:0

VITS-fast-fine-tuning

This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion

License:Apache-2.0Stargazers:0Issues:0Issues:0

vits2

VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design

Language:Jupyter NotebookLicense:MITStargazers:0Issues:0Issues:0

vocos

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

License:MITStargazers:0Issues:0Issues:0

wetts

Production First and Production Ready End-to-End Text-to-Speech Toolkit

License:Apache-2.0Stargazers:0Issues:0Issues:0