Diabetes Analysis

Project Overview

This project was done during my internship at Meriskill and the objective was to to predict the likelihood of diabetes in patients based on various diagnostic measurements

About the Dataset

The dataset was originally from the National Institute of Diabetes and Digestive and Kidney Diseases. In particular, all patients here are females at least 21 years old of Pima Indian heritage 2 The dataset includes several features (independent variables). It has only one target variable (Outcome) indicating whether a patient has diabetes (1 for positive, 0 for negative). The dataset consists of 768 rows and 9 columns

Features

Pregnancies: The number of pregnancies a patient has going through.
Glucose: Plasma glucose concentration( an indicator of blood sugar levels).
Blood Pressure: Diastolic blood pressure.
Skin Thickness: Skinfold thickness (related to body composition).
Insulin: 2-Hour serum insulin level.
BMI: A measure of body weight and height (indicating body fat).
Diabetes Pedigree Function: A measure of the diabetes heredity risk based on family history.
Age: The age of the patient in years.
Target Variable: Indicates whether the patient has diabetes (1 for positive, 0 for negative).

Analysis

Tool used: Jupyter notebook

Insights

The data was analysed from 768 patients, with a total insulin level of 52K.
The avwerage age of a patient is 32, the average BMI is 32.30, and the average blood pressure is 70.24.
The average skin thickness is at 28.66, and glucose levels at 122.

Ambogo2 / Predictive-modelling