RutujChheda / Enron_Emails_Dataset_Processed

This repository contains code for normalizing the Enron dataset.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Enron Dataset Normalization

This repository contains code for normalizing the Enron dataset. The Enron dataset is a collection of emails and other documents that were exchanged by employees of Enron Corporation, a major energy company that collapsed in 2001 due to accounting fraud. The dataset is a valuable resource for researchers who are studying corporate fraud and other financial crimes. Files

The repository contains the following files:

Enron_Data_normalization.ipynb: A Jupyter notebook that contains the code for normalizing the Enron dataset.
requirements.txt: A file that lists the dependencies that need to be installed in order to run the code.

#Instructions

To run the code, first install the dependencies

Then, open the Jupyter notebook and run the cells one by one. Dataset

The Enron dataset is not included in this repository. You can download the dataset from the following URL: https://www.cs.cmu.edu/~enron/

The code is written for A CSV version of the dataset, which I am sharing using Google Drive due to GitHub's restriction on large file uploads https://drive.google.com/file/d/1VLY0Xqhkg25FGuTIvUKiAfcGeX1fczQa/view?usp=drive_link

About

This repository contains code for normalizing the Enron dataset.


Languages

Language:Jupyter Notebook 100.0%