speech-recognition nlp self-supervised-learning transfert-learning

Self Supervised Low Ressources Speech retrieval

Our project for the "Algorithms for speech and natural language processing (MVA 2021)" class

This project is based on the wav2vec 2.0 model that has been pretrained on massive unsupervised dataset. We then use this pretrain model and perform transfert learning on languages that have not a lot of labeled data. We study the importance of the pretraining language, the impact of the size of the transfert dataset, and the language similarity impact between pretraining and fine tuning

The following shows the Wav2Vec2 architecture

About

This project transfert the self supervised Wav2vec2 representation to low ressources languages

speech-recognition nlp self-supervised-learning transfert-learning

Languages

Language:Jupyter Notebook 98.2%Language:Python 1.8%