Orca0917 / TransformerTTS

PyTorch implementation of Transformer-TTS, which can be executed on Google Colab

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TransformerTTS

Overview

This repository is a PyTorch implementation of a neural network-based speech synthesis model, Transformer-TTS, which uses the Transformer Network. It is based on the code by ChoiHkk's Transformer-TTS and has been trained on the LJSpeech dataset. You can run the notebook in a Google Colab environment.


Model architecture

Transformer architecture


Dataset

The dataset used is the English speech dataset LJSpeech. In the Jupyter notebook, the dataset is utilized without a separate download by using the torchaudio package. The data preprocessing was implemented by referring to the tacotron audio preprocessing by Kyubong Park.


Result

The training was conducted with a batch size of 16 on a total of 13,100 voice datasets for 10 epochs. The result is expressed as a gif showing the predicted mel spectrogram and the ground truth mel spectrogram every 100 steps.

training result


Dependency

torch                            2.3.0+cu121
torchaudio                       2.3.0+cu121
librosa                          0.10.2.post1
numpy                            1.25.2
scipy                            1.11.4
python                           3.10.12

References

About

PyTorch implementation of Transformer-TTS, which can be executed on Google Colab


Languages

Language:Python 58.1%Language:Jupyter Notebook 41.9%