hyzhan

followers

following

stars

Guangzhou

hyzhan's repositories

auraloss

Collection of audio-focused loss functions in PyTorch

Language:PythonApache-2.0000

code01

Language:Python000

CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Language:PythonApache-2.0000

denoiser

Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.

Language:PythonNOASSERTION000

g2pM

A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset

Apache-2.0000

gcn

Implementation of Graph Convolutional Networks in TensorFlow

Language:PythonMIT000

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

MIT000

GPT2-Chinese

Chinese version of GPT2 training code, using BERT tokenizer.

MIT000

grafx

GRAFX: An Open-Source Library for Audio Processing Graphs in PyTorch

000

hyzhan.github.io

Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes

Language:JavaScriptMIT000

ICASSP2022

Language:HTML000

Interspeech2021

Interspeech2021

Language:HTML020

lightconv_pt

lightconv_layer fairseq

000

Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi

Language:PythonMIT000

NAC-TTS

Language:HTML000

NVIDIA_SGEMM_PRACTICE

Step-by-step optimization of CUDA SGEMM

000

phonological-features

Materials accompanying the paper "Phonological features for 0-shot multilingual speech synthesis"

Language:Python010

PyTorch-BigGraph

Software used for generating embeddings from large-scale graph-structured data.

Language:PythonNOASSERTION000

Real-Time-Voice-Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Language:PythonNOASSERTION000

spleeter

Deezer source separation library including pretrained models.

Language:PythonMIT010

StyleDubber

[ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"

MIT000

TTS_TFLite

This repository is a collection of TTS Models in TFLite

Apache-2.0000

ubisoft-laforge-daft-exprt

Language:PythonApache-2.0010

vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech)

Language:PythonApache-2.0000

vits_chinese

Best TTS based on BERT and VITS with some Natural Speech Features Of Microsoft

000

voice-filter

A unofficial Pytorch implementation of Google's VoiceFilter

Language:Python010

voice_conversion

Language:Python000

w2v2-how-to

How to use our public wav2vec2 dimensional emotion model

MIT000

waveglow

A Flow-based Generative Network for Speech Synthesis

Language:PythonBSD-3-Clause000

WaveRNN-Pytorch

Fatcord's Alternative WaveRNN (Faster training)

MIT000