RyMey / Exploration-and-Explanatory-Flight-Data

This project is communicate data finding which the problems provided by Udacity. I completed this project as part of Udacity Nanodegree Program in Data Analyst (https://www.udacity.com/course/data-analyst-nanodegree--nd002).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Exploration-and-Explanatory-PISA-Data

This project is communicate data finding which the problems provided by Udacity. I completed this project as part of Udacity Nanodegree Program in Data Analyst (https://www.udacity.com/course/data-analyst-nanodegree--nd002).

Project Overview

Performance flights can be measure with number of Ontime, Delay, Cancelled, and Diverted Flights. This project want to analyze what the performance of flights in US at 2006 until 2007. The data can be found in http://stat-computing.org/dataexpo/2009/the-data.html

Library

To run this project, you need import library:

  1. pandas
  2. numpy
  3. matplotlib.pyplot
  4. seaborn

*to save into html slide, use file output_toggle.tpl, run this code in terminal: ipython nbconvert slide_deck.ipynb --to slides --template output_toggle.tpl --post serve

Part of this project:

  1. Wrangling: https://github.com/RyMey/Exploration-and-Explanatory-Flight-Data/blob/master/wrangling.ipynb

  2. Exploratory: https://github.com/RyMey/Exploration-and-Explanatory-Flight-Data/blob/master/exploration.ipynb

  3. Explanatory: https://github.com/RyMey/Exploration-and-Explanatory-Flight-Data/blob/master/slide_deck.slides.html

Summary

In three years, the worst performance was delayed by a total of ~ 3 million flights, and then canceled with 4 hundred flights, and diverted with 50 thousand flights. From 21,604,865 total flights in three years, the delay has a proportion of 13%, canceled has 1.94%, and diverted has 0.23%. But from all bad performance schedule flights, the proportion of delay is 86.4%, Cancelled 12.1%, and diverted 1.5%. The most delayed Origin is PUB (Pueblo) , PIR (Pierre), ADK (Adak Island). The most Ontime is HVN (new Haven), GLH (Greenville, MS), and MKC (kansas City Donwtown). In three years, the most cancelled Origin is TEX (Telluride), ALO (Waterloo), HKY (Hickory Regional). The most not cancelled is PIR (Pierre), HVN (new Haven), and PUB (Pueblo). The most diverted Origin is ADK (Adak Island), HLN (helena,Montana) , SIT (Sitka, Alaska). The most not diverted is SMX (Santa Maria), VLD (Valdoka), BQK (Brunswick). The departure and arrival correlations are very hight, so the highest option is when one of other Delay so the other will also be delayed. If the flights has departure delay, so the highest possible is flights arrival will be delayed. But some flights doesn't Delay in departure have delay in arrival.The most reason of Cancelled flights in each year is carrier. In some month like Desember, and Februari, many flights was cancelled because of Weather. So it may can be consideration from customer if they want to buy some ticket in that month. From the trends, delayed and cancelled is decrase but flights diverted is increase over year.

About

This project is communicate data finding which the problems provided by Udacity. I completed this project as part of Udacity Nanodegree Program in Data Analyst (https://www.udacity.com/course/data-analyst-nanodegree--nd002).


Languages

Language:Jupyter Notebook 70.3%Language:HTML 29.7%