EricFuma

Fu Guanyu's starred repositories

Promptify

Prompt Engineering | Prompt Versioning | Use GPT or other prompt based models to get structured output. Join our discord for Prompt-Engineering, LLMs and other latest research

Language:Jupyter NotebookApache-2.0321200

KAN-TTS

KAN-TTS is a speech-synthesis training framework, please try the demos we have posted at https://modelscope.cn/models?page=1&tasks=text-to-speech

Language:PythonMIT48600

NeuFA

Neural network-based forced alignment with bidirectional attention mechanism

Language:Python7000

SiFiGAN

Official implementation of the source-filter HiFiGAN vocoder

Language:PythonMIT23400

jukebox

Code for the paper "Jukebox: A Generative Model for Music"

Language:PythonNOASSERTION778200

grpc

The C based gRPC (C++, Python, Ruby, Objective-C, PHP, C#)

Language:C++Apache-2.04172500

audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch

Language:PythonMIT239900

mmengine

OpenMMLab Foundational Library for Training Deep Learning Models

Language:PythonApache-2.0115900

VITS-BigVGAN-SpanPSP-Chinese

基于PyTorch的VITS-BigVGAN的tts中文模型，加入韵律预测模型。

Language:Python19300

DiffGAN-TTS

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

Language:PythonMIT31300

MB-iSTFT-VITS

Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform

Language:PythonApache-2.041700

encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Language:PythonMIT345200

kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

Language:ShellNOASSERTION1418900

audiomentations

A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.

Language:PythonMIT182800

TTS-frontend

TTS-frontend with Bert and CRF/lstm (For Tacotron)

Language:Python4900

PaddleHub

Awesome pre-trained models toolkit based on PaddlePaddle. (400+ models including Image, Text, Audio, Video and Cross-Modal with Easy Inference & Serving)【安全加固，暂停交互，请耐心等待】

Language:PythonApache-2.01269200

ltp

Language Technology Platform

Language:Python494100

unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Language:PythonMIT1970900

WeTextProcessing

Text Normalization & Inverse Text Normalization

Language:PythonApache-2.045600

gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

Language:PythonApache-2.03260800

paper2gui

Convert AI papers to GUI，Make it easy and convenient for everyone to use artificial intelligence technology。让每个人都简单方便的使用前沿人工智能技术

Language:Jupyter NotebookMIT1020000

ParallelWaveGAN

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch

Language:Jupyter NotebookMIT154700

chinese_text_normalization

Chinese text normalization for speech processing

Language:PythonMIT62100

tldraw

SDK for creating whiteboards and canvas experiences on the web.

Language:TypeScriptNOASSERTION3541300

speech_dataset

The dataset of Speech Recognition

Apache-2.038300

chinese_speech_pretrain

chinese speech pretrained models

Language:Shell101400

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

Language:PythonApache-2.01099000

forced-alignment-tools

A collection of links and notes on forced alignment tools

Language:PythonNOASSERTION86800