We Rate Dogs - Data Wrangling

Introduction

Real-world data rarely comes clean. Here, the main goal is to wrangle WeRateDogs Twitter data to create interesting and trustworthy analyses and visualizations. Using Python and its libraries, I have gathered data from a variety of sources and in a variety of formats, assessed its quality and tidiness, and then cleaned it under the data wrangling process.

Here, with documenting my wrangling efforts, I have also showcased them through analyses and visualizations using Python (and its libraries).

The dataset that I have wrangled (and analyzed and visualized) is the tweet archive of Twitter user @dog_rates, also known as WeRateDogs. WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. These ratings almost always have a denominator of 10. The numerators, though? Almost always greater than 10. 11/10, 12/10, 13/10, etc. Why? Because "they're good dogs Brent." WeRateDogs has over 4 million followers and has received international media coverage.

Softwares needed:

You will need an installation of Python, plus the following libraries:

pandas
NumPy
requests
tweepy
json

A text editor, like VS Code or Atom.
A terminal application (Terminal on Mac and Linux or Cygwin on Windows).

Installation links for softwares:

Summary:

The whole report can be summarized into the following 2 files which are present in this repository:

For getting a brief of the Data Wrangling process, check wrangle_report.html
For visualizations and important insights, check act_report.pdf

References

About

Report emphasizing on wrangling efforts for WeRateDogs tweet archive data

Languages

Language:Jupyter Notebook 79.6%Language:HTML 20.4%