Aminoid / quora-question-pairs

Kaggle Competition - Quora Question Pairs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Quora Question Pairs

Kaggle competition

Files

main.py -- Code to implement all the models

Datasets

The datasets are available as train.csv.zip and test.csv.zip at: https://www.kaggle.com/c/quora-question-pairs/data

Dependencies

numpy, sklearn, pandas, nltk, csv, re

How to run

python main.py <jaccard|cosine|tfidf|logistic|naivebayes|randomforest|voting>

Individual Classifiers

  • Jaccard Similarity
  • Cosine Similarity
  • Pearson Coefficient
  • TF-IDF based Cosine Similarity

Ensemble Classifiers

  • Logistic Regression
  • Naive Bayes Model
  • Random Forest Model
  • Probabilistic Voting Ensemble

Note: The voting ensemble takes a huge amount of time to train

Results

log-loss value of: 0.40167 with Probabilistic Voting Ensemble. (Still improving it)

About

Kaggle Competition - Quora Question Pairs


Languages

Language:Python 100.0%