Charlottecuc / malaya-speech

Speech Toolkit for bahasa Malaysia, https://malaya-speech.readthedocs.io/

Home Page:https://malaya-speech.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

logo

Pypi version Python3 version MIT License total stats download stats / month discord


Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Documentation

Proper documentation is available at https://malaya-speech.readthedocs.io/

Installing from the PyPI

CPU version :

$ pip install malaya-speech

GPU version :

$ pip install malaya-speech[gpu]

Only Python 3.6.0 and above and Tensorflow 1.15.0 and above are supported.

We recommend to use virtualenv for development. All examples tested on Tensorflow version 1.15.4, 2.4.1 and 2.5.

Features

  • Age Detection, detect age in speech using Finetuned Speaker Vector.
  • Speaker Diarization, diarizing speakers using Pretrained Speaker Vector.
  • Emotion Detection, detect emotions in speech using Finetuned Speaker Vector.
  • Gender Detection, detect genders in speech using Finetuned Speaker Vector.
  • Language Detection, detect hyperlocal languages in speech using Finetuned Speaker Vector.
  • Multispeaker Separation, Multispeaker separation using FastSep on 8k Wav.
  • Noise Reduction, reduce multilevel noises using STFT UNET.
  • Speaker Change, detect changing speakers using Finetuned Speaker Vector.
  • Speaker overlap, detect overlap speakers using Finetuned Speaker Vector.
  • Speaker Vector, calculate similarity between speakers using Pretrained Speaker Vector.
  • Speech Enhancement, enhance voice activities using Waveform UNET.
  • SpeechSplit Conversion, detailed speaking style conversion by disentangling speech into content, timbre, rhythm and pitch using PyWorld and PySPTK.
  • Speech-to-Text, End-to-End Speech to Text for Malay and Mixed (Malay and Singlish) using RNN-Transducer and Wav2Vec2 CTC.
  • Super Resolution, Super Resolution 4x for Waveform.
  • Text-to-Speech, Text to Speech for Malay and Singlish using Tacotron2, FastSpeech2 and FastPitch.
  • Vocoder, convert Mel to Waveform using MelGAN, Multiband MelGAN and Universal MelGAN Vocoder.
  • Voice Activity Detection, detect voice activities using Finetuned Speaker Vector.
  • Voice Conversion, Many-to-One, One-to-Many, Many-to-Many, and Zero-shot Voice Conversion.
  • Hybrid 8-bit Quantization, provide hybrid 8-bit quantization for all models to reduce inference time up to 2x and model size up to 4x.

Pretrained Models

Malaya-Speech also released pretrained models, simply check at malaya-speech/pretrained-model

References

If you use our software for research, please cite:

@misc{Malaya, Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow,
  author = {Husein, Zolkepli},
  title = {Malaya-Speech},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/huseinzol05/malaya-speech}}
}

Acknowledgement

Thanks to KeyReply for sponsoring private cloud to train Malaya-Speech models, without it, this library will collapse entirely.

logo

About

Speech Toolkit for bahasa Malaysia, https://malaya-speech.readthedocs.io/

https://malaya-speech.readthedocs.io/

License:MIT License


Languages

Language:Jupyter Notebook 94.5%Language:C++ 3.2%Language:Python 1.3%Language:HTML 0.5%Language:Makefile 0.2%Language:Shell 0.1%Language:Cython 0.1%Language:CMake 0.0%Language:M4 0.0%Language:C 0.0%Language:SWIG 0.0%