Salary Prediction

Introduction

This project aims to predict a person's salary based on their characteristics. We will make use of the Adult dataset, which comes from the following University of California Irvine path: https://archive.ics.uci.edu/ml/datasets/Census+Income.

The objective of the problem is to predict if a person has a salary of more than 50 thousand dollars per year or not, based on their characteristics. The dataset contains a total of 14 predictor variables and a continuous variable to predict the salary.

Dataset Description

The Adult dataset contains the following variables:

age: continuous.
workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.
fnlwgt: continuous.
education: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.
education-num: continuous.
marital-status: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.
occupation: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.
relationship: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.
race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.
sex: Female, Male.
capital-gain: continuous.
capital-loss: continuous.
hours-per-week: continuous.
native-country: United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holland-Netherlands.

Objectives

The main objectives of this project are:

Treat correctly features of a dataset according to their type (numerical, categorical).
Transform data to remove outliers, encoding categorical features, or create new columns.
Handle missing values.
Differentiate main models of supervised learning.
Know when to use classification or regression methods.
Use the appropriate metrics related to the algorithms used.

Sersal10 / Machine-Learning-project

Salary Prediction

Introduction

Dataset Description

Objectives

About

Languages