Aviro_health_challenge

Problem Statement

We must find innovative ways that are easy and safe to get testing and treatment done and prevent new infections, or we will see another generation with millions forced to live with HIV. In this project, we are going to have a machine learning model to determine the likelihood a test been positive or Negative and identifying the factors(features) that makes a test to be positive or negative.

Solution

Title: Determine the likelihood of a diagonistics test been positive or negative using machine learning and interpretable machine learning

Steps in solving the problem

identify the problem in the dataset
Setting appropriate evaluation metrics (ROC_AUC curve)
Data Exploration
Data Analysis
Statistical inference on some features
Data Transformation and cleaning
Feature Selection
Cross validation
Building Xgboost Machine Learning model.
Interpreting machine learning model using Shap.

Findings

Uni-Variate Analysis

Most of the test are Negative with 92.9% and 7.1% positive test
Most patient last tested
Female Gender visit the clinic the Male

Bi-Variate Analysis

The tested within short time are negative and while more than a 6 month tested are more positive.
Most tested postitve speaks Sesotho and Xhosa

Diagonistics (Statistical Inference)

art_number_issued, art_initiation, initiation_handover, confirmatory_test_done, last_tested, gender_male, covid_screening_fever, gender_female, oraquick_dentures, covid_screening_pain_headache, appointment Have significant impact on result response

While

gender_other, gender_rather_not_say, covid_screening_shortness_of_breath, covid_screening_travel, oraquick_mouthwash, covid_screening_contact, covid_screening_cough, oraquick_bleeding_gums, covid_symptoms Did not have significant impact on the result response.

Result

ROC_AUC Score 90%

Feature Importance

Interpretable Machine Learning with Shap

The base value is Expected Value: -3.2963135

The top 5 features for determining whether a test response will be positive or negative are:

Confirmatory test
last_tested
engage time difference
appointment
how ease to use a tool

About

We must find innovative ways that are easy and safe to get testing and treatment done and prevent new infections, or we will see another generation with millions forced to live with HIV. In this project, we are going to have a machine learning model to determine the likelihood a test been positive or Negative and identifying the factors(features) that makes a test to be positive or negative.

Languages

Language:Jupyter Notebook 100.0%