kylechang523 / ECE180-Final-Project-Personal-Income-Analysis-and-Prediction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Personal Income Analysis and Prediction

ECE180 Team2: Kaiyu Zhang / Wenyu Zhang / Fengcan Zhu / Linqi Pan / Yucheng Huang / Woen Lee / Andrew (Dung) Luong

PURPOSE

This project aims to predict whether a person’s income is above 50K or not, which is an important line to measure their living standards. Based on the adult data extracted from Census database, we want to design an optimized classifier, which will help employers consider the suitable salary for the new worker. Furthermore, we will analyze the influence of certain attributes and their correlations by data visualization. The outcome could be taken as reference for personal development.

DATA ACQUISITION AND ANALYSIS

We obtain the data from UC Irvine machine learning database, which including the samples’ different features and their annual income is above 50K or not.

Dataset: 4 million adults’ features and salary statistics in 1994 https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data

The project should include several parts: To address our purpose using the data, we first need to have access to website document for data with Urllib/Socket; After observing the data, we use Numpy and Pandas to extract features and do analysis; Matplotlib, NetworkX can help with the visualization. Basemap, Geoplotlib associate geometric data with networks when visualizing distribution of countries. Scikit-learn will provide machine learning algorithms that we might use in classifier.

Feature Responsibilities

Responsibility Feature
Jax Race, Education
Kaiyu Age, Working Year
Woen Nationality, Marriage Status
Yucheng Job Type, Gender

EXPECTED TIMELINE

Task Start/End Time Assigned Member
Data Acquisition/Assembly 2/9 - 2/11 Fengcan & Linqi & Yucheng
Data Learning: Classifier Training 2/12 - 2/16 Kaiyu & Wenyu & Woen
Data Analysis I: Prediction Accuracy 2/17 - 2/24 Kaiyu & Andrew & Woen
Data Analysis II: Impact Ranking 2/17 - 2/24 Andrew & Fengcan & Linqi
Data Analysis III: Attribute Correlation 2/12 - 2/24 Wenyu & Yucheng
Data Visualization: Plotting/Networking 2/24 - 3/4 All members
Conclusion and Report 3/1 - 3/6 All members

About


Languages

Language:Jupyter Notebook 99.8%Language:Python 0.2%