Sersal10 / Machine-Learning-project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Salary Prediction

Introduction

This project aims to predict a person's salary based on their characteristics. We will make use of the Adult dataset, which comes from the following University of California Irvine path: https://archive.ics.uci.edu/ml/datasets/Census+Income.

The objective of the problem is to predict if a person has a salary of more than 50 thousand dollars per year or not, based on their characteristics. The dataset contains a total of 14 predictor variables and a continuous variable to predict the salary.

Dataset Description

The Adult dataset contains the following variables:

  • age: continuous.
  • workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.
  • fnlwgt: continuous.
  • education: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.
  • education-num: continuous.
  • marital-status: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.
  • occupation: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.
  • relationship: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.
  • race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.
  • sex: Female, Male.
  • capital-gain: continuous.
  • capital-loss: continuous.
  • hours-per-week: continuous.
  • native-country: United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holland-Netherlands.

Objectives

The main objectives of this project are:

  • Treat correctly features of a dataset according to their type (numerical, categorical).
  • Transform data to remove outliers, encoding categorical features, or create new columns.
  • Handle missing values.
  • Differentiate main models of supervised learning.
  • Know when to use classification or regression methods.
  • Use the appropriate metrics related to the algorithms used.

About


Languages

Language:Jupyter Notebook 100.0%