rupak-118 / Question-Type-Classification

Solving QC problem from UIUC's CogComp dataset

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question-Type-Classification

Problem statement :

Question-type classification into Coarse and fine labels (6 coarse classes, 50 fine classes)

Dataset :

The training and test datasets can be downloaded from here : http://cogcomp.cs.illinois.edu/Data/QA/QC/

Approach and methods/experiments :

  • Text pre-processing (exploration, cleaning, stemming, lemmatization, stopword removal etc.)
  • Word embeddings (CountVectors, TF-IDF, word2vec, GloVe)
  • Linear ML models (Logistic Regression, SVM)
  • Non-linear and Tree models (Gaussian Naive-Bayes, Multinomial NB, Bernoulli NB, LightGBM, XGBoost)

Future steps/experiments :

  • Ensembles
  • RNNs/LSTMs
  • Implementing state-of-the-art papers in the QC domain
  • Better feature extraction (e.g. using only first few words/keywords from each question, or, deriving word2vec embeddings weighted by tf-idf values etc.)
  • Manual inspection of examples which the model is getting wrong to derive insights and improve precision/recall

About

Solving QC problem from UIUC's CogComp dataset

License:MIT License


Languages

Language:Jupyter Notebook 100.0%