BenbenIO / Diabetes_analysis

Machine Learning project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Diabetes_analysis

Machine Learning project - 2018

This Machine Learning project is still on going.

The objective of the project is to find correlation, and design a model for diabetes monitoring/detection based on the urine sample. The motivation of this work is to find an easy, convenient and non-painful (blood test) way to monitor diabetes. This project can be considered as a part of my master degree consisting of designing and developing an Urine Sensor.

The used data can be found HERE

Description

In this project, the data preprocessing was quite challenging (feature's name unintelligible, missing data, cross-checking between the different .csv) I recommend, like I did to have a look at the actual inquiry and questionnaire (HERE)

Firstly, I selected urine data, demographic data and used the questionnaire answer to create my label "Diabetes". Then I clean the data, and get rid of the feature with too many missing value. Finally, I oversample the diabetes label. After this step, I got the following data. It is really few features but I still decided to continue the analysis.

The objective of the primary analysis was to obtained the importance feature. To do so, I trained to model with random forest and Catboost And I compare the results:

Random forest:

Catboost:

Feature importance: (right: Random forest, left: Catboost)

With this result, we can see that it is possible to have an estimation of the diabetes based on the urine sample and simple test. The quality of the model (score) is not excellent and can be increase with more tunning and other data. This Machine Learning project is still on going.

Code

In this repository, you can find a notebook or python version of the code. You will need: Sklearn, Catboost, Seaborn, imblearn modul

Do not hesitat if you have any question :)

About

Machine Learning project


Languages

Language:Jupyter Notebook 97.5%Language:Python 2.5%