kaggle-fake-news

Classifier uses a basic Text processing pipeline over just the text column to predict fake news:

Text cleaning: accent removal, lower case
Tokenization
Stopword removal
Lemmatization/Stemming
TFIDF vectorization
Experiments with tree classifiers like Decision Trees and Gradient Boosted Trees
Achieved F1 of 87-91.5% on 20% validation set (best F1 with Gradient Boosted Trees) (NOTE: this is not k-cross validated)

Next Steps:

About

Baseline TFIDF solution to https://www.kaggle.com/c/fake-news/data

Language:Jupyter Notebook 100.0%