sfox1975 / Udacity-DAND-Project-5

Machine learning model for identifying persons-of-interest (POI) from Enron data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Udacity-DAND-Project-5

Machine learning model for identifying persons-of-interest (POI) from Enron data

Overview

The purpose of this project was to use various machine learning algorithms for identifying POIs, namely company insiders possibly involved in financial fraud at the now shuttered Enron Corporation (side note: I spent my first job after college working in a building across the street from the Enron building in downtown Houston - I always wondered what anyone did over there).

The project covers the entire exploratory data analysis (EDA) scope and most of the meat of the analysis and code can be found in the poi_id.ipynb file.

The project was completed as part of the Udacity Data Analyst Nanodegree program.

Additional Documents

The repository contains the following additional documentation:

  1. ‘Enron61702insiderpay.pdf’ - this file was provided by Udacity and used for the purposes of data checking and cleaning

  2. 'final_project_dataset.pkl' - pickled form of the raw data, containing financial and email features on Enron employees

  3. 'DAND_P5_Fox.pdf' - final report for the project, addressing methodology, results, conclusions

  4. 'poi_id.ipynb' - Jupyter notebook containing the EDA and machine learning algorithm development / optimization for the project (this is the meat of the project)

  5. 'poi_id.html' - Static html version of the previously mentioned Jupyter notebook file.

  6. 'poi_id.py' - The final version of the machine learning algorithm developed and refined in the Jupyter notebook file.

  7. 'tester.py' and 'feature_format.py' - provided by Udacity and used for testing that the final algorithm (contained in poi_id.py) conforms to the required project specifications.

  8. 'References.txt' - a list of references for the project

About

Machine learning model for identifying persons-of-interest (POI) from Enron data


Languages

Language:HTML 66.9%Language:Jupyter Notebook 31.1%Language:Python 2.0%