bhavyadubey / Online_News_Popularity

This work will help online news companies to predict news popularity before publication ,the news popularity are often indicated by the amount of reads, likes or shares. For the web news stake holders, it’s very valuable if the recognition of the news articles are often accurately predicted before the publication.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Online News Popularity Prediction

This work will help online news companies to predict news popularity before publication ,the news popularity are often indicated by the amount of reads, likes or shares. For the web news stake holders, it’s very valuable if the recognition of the news articles are often accurately predicted before the publication. Thus, it's interesting and meaningful to use the machine learning techniques to predict the recognition of online news articles.In our project, the dataset including 39,643 news articles from website Mashable, we attempt to find the simplest classification learning algorithm to accurately predict if a news story will become popular or not before publication.

List of Predictive Attributes of Dataset:

image

For each instance of the dataset, it has 61 attributes which includes 1 target attribute (number of shares), 2 non-predictive features (URL of the article and Days between the article publication and the dataset acquisition) and 58 predictive features.

Graphs and Visualizations

Popular/unpopular news over different days of a week

image

Popular/unpopular news over different article category

image

Before algorithm implementation, for each algorithm, I also randomly split dataset with its own selected features into training set (90%) and testing set (10%). The logistic regression, RF and Adaboost are implemented by the sklearn function LogisticRegression(), RandomForestClassifier() and AdaBoostClassifier(), respectively.

Performance of three classifiers under default parameter settings:

image

Final Model Result and Accuracy Scores

Tested the model with training/testing set ratio 0.15

image

I came to conclusion after comparing the results obtained from all the three classifiers used that Random forest algorithm proves to be the most accurate amongst all giving us an accuracy rate of 67%.

Dataset Link : https://archive.ics.uci.edu/ml/datasets/Online+News+

About

This work will help online news companies to predict news popularity before publication ,the news popularity are often indicated by the amount of reads, likes or shares. For the web news stake holders, it’s very valuable if the recognition of the news articles are often accurately predicted before the publication.


Languages

Language:Jupyter Notebook 100.0%