wangyang199609 / MuSE

MuSE

A PyTorch implementation of the Muse: Multi-modal target speaker extraction with visual cues

Project Structure

/data/voxceleb2-800: Scripts to preprocess the voxceleb2 datasets.

/pretrain_networks: The visual front-end network

/src: The training scripts

Pre-trained Weights

Download the pre-trained weights for the Visual Frontend and place it in the ./pretrain_networks folder using the following command:

wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=1k0Zk90ASft89-xAEUbu5CmZWih_u_lRN' -O visual_frontend.pt

References

The pre-trained weights of the Visual Frontend have been obtained from Afouras T. and Chung J, Deep Audio-Visual Speech Recognition GitHub repository.
The model is adapted from Conv-TasNet GitHub repository.

About

Languages

Language:Python 96.0%Language:Shell 4.0%