machine-learning python numpy scikit-learn pandas classification email-classifier enron-dataset jupyter-notebook

Enron Classification

💡 Project Background

Enron Corporation was an American energy, commodities, and services company based in Houston. At the end of 2001, it was revealed that Enron's reported financial condition was sustained by an institutionalized, systematic, and creatively planned accounting fraud. Special-purpose entities created to mask significant liabilities which made Enron seem more profitable than it was, created a dangerous spiral. Each quarter, officers would have to perform more financial deception to create the illusion of profit while the company was actually losing money which increased stock prices.

The Enron Corpus is a database of over 0.5 million emails generated by 158 employees of the Enron Corporation in the years leading up to the company's collapse in December 2001. The corpus was generated from Enron email servers by the FERC during its subsequent investigation. A copy of the email database was subsequently purchased for $10,000 by a computer scientist to be used for research studies.

💬 Classification

Machine Learning Models Used -

Logistic Regression
Support Vector Machine (Linear)
Support Vector Machine (RBF)
K Nearest Neighbor
Decision Tree Classifier
Random Forest Classifier
Gradient Boosting
Multinomial Naïve Bayes

Python Libraries Used

About

Exploratory Analysis of Enron Dataset and Classification using multiple algorithms

machine-learning python numpy scikit-learn pandas classification email-classifier enron-dataset jupyter-notebook

Languages

Language:Jupyter Notebook 97.0%Language:Python 3.0%