anantham / DS200-VisualizeData

Code for Module 4: Literature Review and Software Tooling. Plotting data from https://data.gov.in/

Home Page:https://anantham.github.io/DS200-VisualizeData/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DS 200 - Visualize Data

Included is the python code for Module 4: Literature Review and Software Tooling.

Scatter Plot

When we have two paired numeric data. This sort of plot helps us confirm if there are any correlations. I turned to the Railway Statistics Summary from 2002 to 2015. There were many items that was included in this data set but I thought that overall gross earning was a good metric for both how many trains ran that year, as well as the number of people. So the hypothesis was that if more people and more train ran then there must be more accidents. This was neglecting the fact that we get better at preventing accidents with time. But if look at the plot below you will note the points are close to the trend line until 2015.

scatterplot

We see the deaths spikes around 2005 as the railway had to scale quickly with both deaths and earnings growing quickly. After that the deaths have stabilized even as earnings continued to grow rapidly over the years.

Box Plot

This type of plot is useful to see the shape of a distribution. Most of the data in https://data.gov.in/ seemed to be too small in number to really showcase the power of this plot. I finally decided to pick the Weekly retail price of Milk from 2003 to 2012 dataset. After cleaning the data (removing NA), the box is shown below.

boxplot

From this plot we can observe that over the 9 year period milk price has varied considerably from as less as 7 Rs to 40 Rs (Min to Max). Even with an inflation of 15%, we can only explain upto 25 Rs. So this plot tells the story of the increasing middle class and consumption growth. The market determines the price. We see the median lies around the 20 Rs Mark and the first and third quartiles are quite close together at around 15 Rs and 24 Rs. The Interquartile range (IQR) is 9 that means the bulk of the prices was around 20 Rs with only 6 Rs movement in either direction.

Bar Plot

This type of plot can help us see the trend of how values change over time. Since I always read about the fiscal deficit in the news and how it affects how foriegn investment flows into our country and also how governments try to keep it under control. I took the State wise fiscal deficit data from 2007 to 2014. I plotted data from my home state - Kerala as shown below.

bargraph

This data shows us how the fiscal deficit peaked on 2012. It was just after the 2011 Kerala Legislative Assembly election where congress won the state elections. We see increased funding from the central govt which partly explains how the deficit got unde control.

About

Code for Module 4: Literature Review and Software Tooling. Plotting data from https://data.gov.in/

https://anantham.github.io/DS200-VisualizeData/