Beast code in Giters

xuexidi's repositories

FastSpeech2-1

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

MIT000

glow-tts

A Generative Flow for Text-to-Speech via Monotonic Alignment Search

MIT000

ERISHA

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

NOASSERTION000

CMIN_moment_retrieval

Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos

000

Resemblyzer

A python package to analyze and compare voices with deep learning

Apache-2.0000

Hadamard-Matrix-for-hashing

CVPR2020: Central Similarity Quantization/Hashing for Efficient Image and Video Retrieval

MIT000

athena

an open-source implementation of sequence-to-sequence based speech processing engine

Apache-2.0000

deepvoice3_pytorch

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

NOASSERTION000

Multilingual_Text_to_Speech

An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.

MIT000

text_enhancement

基于lasertagger生成文本，用于文本复述和数据增

000

zhrtvc

Chinese real time voice cloning (VC) and Chinese text to speech (TTS). 好用的中文语音克隆兼中文语音合成系统，包含语音编码器、语音合成器、声码器和可视化模块。

000

zhvoice

Chinese voice corpus. 中文语音语料，语音更加清晰自然，包含8个开源数据集，3200个说话人，900小时语音，1300万字。

000

tacotronv2_wavernn_chinese

tacotronV2 + wavernn 实现中文语音合成(Tensorflow + pytorch)

000

mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data

BSD-3-Clause000

Tacotron_VAE

Multi-Speaker Tacotron2 with VAE

000

Real-Time-Voice-Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time

NOASSERTION000

autovc

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

MIT000

style-token_tacotron2

style token with tacotron2

MIT000

TransformerTTS

🤖💬 Transformer TTS: Implementation of a non-autoregressive Transformer based neural network for text to speech.

NOASSERTION000

SpeechSplit

Unsupervised Speech Decomposition Via Triple Information Bottleneck

MIT100

melgan

MelGAN vocoder (compatible with NVIDIA/tacotron2)

BSD-3-Clause000

ParallelWaveGAN

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN) with Pytorch

MIT000

melgan-neurips

GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis

MIT000

melgan-1

MelGAN implementation with Multi-Band and Full Band supports...

BSD-3-Clause000

Awesome-EmbodiedAI

A curated list about Awesome Embodied AI works and is still in construct. Now it contains a list of Simulators, Tasks and Datasets.

MIT000

DurIAN

Implementation of "Duration Informed Attention Network for Multimodal Synthesis" (https://arxiv.org/pdf/1909.01700.pdf) paper.

BSD-3-Clause000

video-keyframe-detector

It is a simple python tool to extract key-frames from a video file using peak estimation from frame difference.

GPL-3.0000

ac-ppo

Actor-Critic and openAI clipped PPO in gym cartpole-v0 and pendulum-v0 environment

000

DurIAN-1

Implementation of "DurIAN: Duration Informed Attention Network For Multimodal Synthesis".

000

WaveRNN

WaveRNN Vocoder + TTS

MIT000