Eliascc5 / English_proficiency_prediction_NLP

Basically the idea of this project is to predict the someone's English proficiency based on a text input.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

English proficiency prediction NLP

Description :

IAS module project at ENIB (SP9 - 2021)

Basically the idea of the project is to predict the someone's English proficiency based on a text input.

We used the The NICT JLE Corpus available here : https://alaginrc.nict.go.jp/nict_jle/index_E.html

The source of the corpus data is the transcripts of the audio-recorded speech samples of 1,281 participants (1.2 million words, 300 hours in total) of English oral proficiency interview test. Each participant got a SST (Standard Speaking Test) score between 1 (low proficiency) and 9 (high proficiency) based on this test.

Tasks :

  • Pre-process the dataset: extract the participant transcript (all <B><B/> tags). Inside participant transcript, you can remove all other tags and extract only English words.
  • Process the dataset: extract features with the Bag of Word (BoW) technique
  • Train a classifier to predict the SST score
  • Compute the accuracy of your system (the number of participant classified correctly) and plot the confusion matrix.
  • Try to improve your system (for example you can try to use GloVe instead of BoW).

Supervisor :

Olivier Augereau

Authors :

CORREA, Elias

GASSIBE, Franco

About

Basically the idea of this project is to predict the someone's English proficiency based on a text input.


Languages

Language:Jupyter Notebook 100.0%