Maoshuiyang

symao's repositories

tensorrtx

Implementation of popular deep learning networks with TensorRT network definition API

MIT000

WenetSpeechSpeakerCluster

000

Leaderboard

SpeechIO Leaderboard: a large, robust, comprehensive, benchmarking platform for Automatic Speech Recognition.

000

DL-Demos

Demos for deep learning

000

vocos

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

MIT000

make-a-video-pytorch

Implementation of Make-A-Video, new SOTA text to video generator from Meta AI, in Pytorch

MIT000

vits-piper

A fast, local neural text to speech system

MIT000

SpecVQGAN

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

MIT000

SadTalker-Video-Lip-Sync

本项目基于SadTalkers实现视频唇形合成的Wav2lip。通过以视频文件方式进行语音驱动生成唇形，设置面部区域可配置的增强方式进行合成唇形（人脸）区域画面增强，提高生成唇形的清晰度。使用DAIN 插帧的DL算法对生成视频进行补帧，补充帧间合成唇形的动作过渡，使合成的唇形更为流畅、真实以及自然。

000

CharsiuG2P

Multilingual G2P in 100 languages

MIT000

gruut

A tokenizer, text cleaner, and phonemizer for many human languages.

MIT000

DL-Art-School

TorToiSe fine-tuning with DLAS

AGPL-3.0000

naturalspeech

A fully working pytorch implementation of NaturalSpeech (Tan et al., 2022)

000

Diffsound

The source code of our paper "Diffsound: discrete diffusion model for text-to-sound generation"

000

vits-cantonese

Cantonese Text to Speech with VITS implementation

MIT000

TranSpeech

PyTorch Implementation of TranSpeech (ICLR'23): Textless NAR Speech-to-Speech Translation with Bilateral Perturbation

MIT000

phonemizer

Simple text to phones converter for multiple languages

GPL-3.0000

Awesome-Diffusion-Models

A collection of resources and papers on Diffusion Models

MIT000

lyra

A Very Low-Bitrate Codec for Speech Compression

Apache-2.0000

WMSeg-upgrade

This is the implementation of Improving Chinese Word Segmentation with Wordhood Memory Networks at ACL2020.

MIT000

cmake-demo

《CMake入门实战》源码

000

open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

MIT000

chinese_speech_pretrain

chinese speech pretrained models

000

multi_quantization

000

mmdetection-to-tensorrt

convert mmdetection model to tensorrt, support fp16, int8, batch input, dynamic shape etc.

Apache-2.0000

LibtorchTutorials

This is a code repository for pytorch c++ (or libtorch) tutorial.

Apache-2.0000

Pytorch-Memory-Utils

pytorch memory track code

000

hifi-gan

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

MIT000

TFGAN

TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis

Apache-2.0000

regnet

Official PyTorch implementation of the TIP paper "Generating Visually Aligned Sound from Videos" and the corresponding Visually Aligned Sound (VAS) dataset.

000