ImonEmmanuel / Aviro_health_challenge

We must find innovative ways that are easy and safe to get testing and treatment done and prevent new infections, or we will see another generation with millions forced to live with HIV. In this project, we are going to have a machine learning model to determine the likelihood a test been positive or Negative and identifying the factors(features) that makes a test to be positive or negative.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Aviro_health_challenge

Problem Statement

We must find innovative ways that are easy and safe to get testing and treatment done and prevent new infections, or we will see another generation with millions forced to live with HIV. In this project, we are going to have a machine learning model to determine the likelihood a test been positive or Negative and identifying the factors(features) that makes a test to be positive or negative.

Solution

Title: Determine the likelihood of a diagonistics test been positive or negative using machine learning and interpretable machine learning

Steps in solving the problem

  1. identify the problem in the dataset
  2. Setting appropriate evaluation metrics (ROC_AUC curve)
  3. Data Exploration
  4. Data Analysis
  5. Statistical inference on some features
  6. Data Transformation and cleaning
  7. Feature Selection
  8. Cross validation
  9. Building Xgboost Machine Learning model.
  10. Interpreting machine learning model using Shap.

Findings

Uni-Variate Analysis

  1. Most of the test are Negative with 92.9% and 7.1% positive test image
  2. Most patient last tested image
  3. Female Gender visit the clinic the Male image

Bi-Variate Analysis

  1. The tested within short time are negative and while more than a 6 month tested are more positive. image
  2. Most tested postitve speaks Sesotho and Xhosa image

Diagonistics (Statistical Inference)

art_number_issued, art_initiation, initiation_handover, confirmatory_test_done, last_tested, gender_male, covid_screening_fever, gender_female, oraquick_dentures, covid_screening_pain_headache, appointment Have significant impact on result response

While

gender_other, gender_rather_not_say, covid_screening_shortness_of_breath, covid_screening_travel, oraquick_mouthwash, covid_screening_contact, covid_screening_cough, oraquick_bleeding_gums, covid_symptoms Did not have significant impact on the result response. image

Result

ROC_AUC Score 90% image

Feature Importance image

Interpretable Machine Learning with Shap

The base value is Expected Value: -3.2963135

The top 5 features for determining whether a test response will be positive or negative are:

  1. Confirmatory test

  2. last_tested

  3. engage time difference

  4. appointment

  5. how ease to use a tool image

image

About

We must find innovative ways that are easy and safe to get testing and treatment done and prevent new infections, or we will see another generation with millions forced to live with HIV. In this project, we are going to have a machine learning model to determine the likelihood a test been positive or Negative and identifying the factors(features) that makes a test to be positive or negative.


Languages

Language:Jupyter Notebook 100.0%