gsass1 / MerkelNet

A lip-to-speech model trained on the Merkel Podcast Corpus

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MerkelNet: Training a Self-Supervised Lip-to-Speech Model in German

Repository for my DL4CV project at THM. Project only tested with Python 3.10.12

You can use this link to download my pretrained checkpoint.

Installation

Create new environment and install dependencies:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Dataset Preprocessing

You must clone the Merkel Podcast Corpus repository and execute the "download_video.py" script. This may take a while.

After you're done you can execute the following to preprocess the data for training.

python ./make_dataset.py --workers N /path/to/corpus

where N is the amount of worker threads you wish to use. Preprocessed files will be saved by default into the data directory.

Training the model

Following example command trains with Wandb logging enabled and also preloads the entire dataset into memory.

python ./train.py --enable-logging --preload

Evaluation

This command runs an evaluation on a random subset of the dataset (controlled by --size) and calculates mean STOI and ESTOI metrics.

python ./eval.py --checkpoint /path/to/checkpoint.pth --size 0.1

Demo

Starts the Gradio demo.

python ./demo.py --checkpoint /path/to/checkpoint.pth

About

A lip-to-speech model trained on the Merkel Podcast Corpus


Languages

Language:Python 100.0%