Description

A Python neural network made with TensorFlow that converts one person's voice into another. The network is trained on audio files of person A's voice that person B needs to replicate to the best degree.

Setup

Prerequisites

Python 3.7 or greater (64-bit)
Python package requirements
NVIDIA Graphics card
Required Drivers and Development Tools

1. Package Requirements

To install the required python libraries, simply execute the following command in the repository working directory.

pip install -r requirements.txt

2. (Highly Recommended) Graphics Card Utilization

A) Required Drivers and Development Tools

Latest NVIDIA GPU Drivers
CUDA Toolkit (v11.0 Update 1)
cuDNN SDK 8.0.4 for CUDA Toolkit v11.0

B) cuDNN

Create a folder named tools under C:\. Drag the cuda folder from the cuDNN zip into the tools folder.

C) Appending %PATH%

Add the following paths to the PATH system environment variable:

C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.0\\bin
C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.0\\extras\\CUPTI\\lib64
C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.0\\include
C:\\tools\\cuda\\bin

3. (Optional) Configuration

In the config.ini file, there are attributes to be changed for configuring the model.

Section	Parameter	Description
MISC	modelName	The name of the model to be created or loaded.
	verbose	If set to True, additional information will be printed while running, along with additional files for easy debugging.
Structure	sliceSize	The number of time samples to be used in the input. (if sliceSize exceeds the length of an audio clip, the audio clip will be omitted from the training data)
	hiddenLayers	A list of integers defining the size of each hidden layer. (should be formatted like so: a,b,c,d or a)
Advanced	learningRate	The learning rate for the Adam optimizer. Tensorflow Documentation
	lossFunc	The loss function. Tensorflow Documentation
	batchSize	The batch size. Tensorflow Documentation

4. Data

Create folders

Voice2Voice.py -l

Running load_data for the first time creates the training and use folders.

Populate folders

Supplying data

Only wav files are supported.
Training files must be in order, corresponding with the files in the other folder.

Training/Input

Place person A's audio recordings into the folder.

Training/Output

Place person B's audio recordings in an order corresponding to person A's audio recordings. (Recommended naming example: "helloJohn.wav")

Use Folder

Place files to be used by model to perform voice conversion. (Suggest using training files as preliminary testing of model)

Convert

Voice2Voice.py -l

Running load_data for the second time converts the contents of inputs, outputs, and use into processable files.

5. Train Model

Voice2Voice.py -t

If let Run for 10,000 epochs or <Ctrl + C> is pressed, the model will be saved DO NOT CLOSE TERMINAL UNTIL MODEL SAVED.

6. Prediction Time!!!

Voice2Voice.py -p

Use the model assigned in config.ini to convert voices from use folder and place them in output.

Usage

Voice2Voice.py [-l | --load_data] [-f | --flush_data] [-t | --train] [-p | --predict]

Argument	Description
[-l \| load_data]	Load audio files in `training` and `use` folders to be usable by the model. This will also create the `training` and `use` folders if they are not present.
[-f \| flush_data]	Delete all converted data.
[-t \| --train]	Create a new model to be trained, or continue training an existing model (dependent on the `modelName` attribute in `config.ini`). Exit training and save model by interrupting the process <Ctrl + C>.
[-p \| --predict]	Load model specified by `modelName` in `config.ini` and predict audio output given audio files in `use` folder.

AdinAck / Voice2Voice