nadiia-duiunova / Adults_pet_project

This is a self-developed pet project. The main goal is to predict person's income based on multiple features.

Home Page:

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Yearly income prediction

From the UCI repository of machine learning datasets, this database contains 14 features concerning demographic characteristics of 45,222 rows (32,561 for training and 12,661 for testing). The task is to predict whether a person has a yearly income that is more or less than $50,000, hence the proble will be formulated as classification task*.

Data Source:*

Reference:* Dua Dheeru, and Efi Karra Taniskidou. “UCI Machine Learning Repository”. Irvine, CA: University of California, School of Information and Computer Science (2017).

Here are the features and their possible values:

  • Age:* continuous.
  • Workclass:* Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.
  • Fnlwgt:* continuous (the number of people the census takers believe that observation represents).
  • Education:* Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.
  • Education-num:* continuous.
  • Marital-status:* Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.
  • Occupation:* Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.
  • Relationship:* Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.
  • Ethnic group:* White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.
  • Sex:* Female, Male. * Note: this data is extracted from the 1994 Census and enforces a binary option on Sex
  • Capital-gain:* continuous.
  • Capital-loss:* continuous.
  • Hours-per-week:* continuous.
  • Native-country:* United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.


This is a self-developed pet project. The main goal is to predict person's income based on multiple features.


Language:Jupyter Notebook 99.4%Language:Python 0.6%