The VoxCeleb Speaker Recognition Challenge 2020 (VoxSRC-20)

Implementation of several loss functions and models to train speaker recognition model on VoxCeleb dataset.

Data Preparation

Extract the data and create a csv formated file as follows:

ID0 /full/path/to/wav
ID1 /full/path/to/wav
...

Train With Your Custom Setup

The code is modular such that one could combine desired trunk model and polling layer, then train the network with desired criterion:

python3.8 trainer.py --csv-path /path/to/csv --trunk-net resnet --lr 0.003 --batch-size 64 --polling-net tap --criterion cosface --m 0.1 --s 20 --criterion-lr 0.001

Take a look at opts.py to see the full options.

Trunk Models

resnet34 (fewer stirdes)
resnet34se
TDS

Polling Layers

tap
sap

Criterions

cosface
psge2e*
protypical

*psge2e (pseudo ge2e loss): Despite the original version, it learns the speakers representations.

This repo is under heavy construction and the lists will be grown.

About

The VoxCeleb Speaker Recognition Challenge 2020

Languages

Language:Python 99.7%Language:Shell 0.3%