luciferreeves / RedditEngagementPrediction

Predict the engagement a post will likely receive, given the time of posting, the number of upvotes, the number of comments, and other factors.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

EAS 4/587 -- Data Intensive Computing

  1. Motivation

    1. Reddit is a social media website where users can post links to articles, images, videos, etc. and other users can comment on them. Authors of the posts, generally look to drive maximum engagement from their posts.

    2. Unlike other social media websites, Reddit has a unique feature of upvoting and downvoting the posts. Also, since the posts are publicly visible, factors like the time of posting, the number of upvotes, the number of comments, etc. matter a lot.

    3. Since there are a lot of posts being made every minute, significant posts can get lost in the crowd. Also, the posts that are made at a particular time of the day, may not be visible to the users who are active at a different time of the day.

  2. Problem Statement

    1. Fetch the data using the Reddit Developer API from different programming related subreddits (communities). Since, there are a lot of subreddits on Reddit; we will keep the scope of the project limited.

    2. Analyze the data and find relevant insights after cleaning and preprocessing the data.

    3. Build a model to predict the engagement a post will likely receive, given the time of posting, the number of upvotes, the number of comments, and other factors.

Further report is available in the report folder.

About

Predict the engagement a post will likely receive, given the time of posting, the number of upvotes, the number of comments, and other factors.


Languages

Language:Jupyter Notebook 99.5%Language:Python 0.4%Language:TeX 0.1%