I have trained the model with the train.csv downloaded from kaggle.com
https://www.kaggle.com/c/fake-news/data?select=train.csv
What did I learn?
-
NLTK library which is used for pre-processing the data
a. Tokenizing words & sentences
b. StopWords and how to remove that
c. Stemming & Lemmatizing
d. Sentimental Analysis
e. Meaning of word using wordnet -
TF-IDF(Term Frequency Inverse Document Frequency) which coverts words to vectors
-
Logistic Regression which is used for classifying the output as 0 or 1
My future scope is to merge this fake news classifier with my portfolio management database.