madhava20217 / Malaria-Detection-from-Cells

Exploring image colour space transformations and augmentation for creating a classifier to characterise parasitized and uninfected RBCs. Proposes a CNN model that uses the Saturation of the HSV colour model to create a high quality classifier resulting in accuracies of 99.3% and above.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Malaria Detection using Cell Images

How to Run the Code

  • Prepare the data by using the data_download.ipynb notebook found in the 'Data Download' directory.
    • Tune the required height and width (parameters at the top of the notebook)
    • The output should create a Data directory containing the original cell images, and a Resized_data_ directory, containing the resized images.
  • Label the data using the labelling.ipynb notebook found in the 'Data Labelling' directory.
    • It will save a CSV of relative filenames and labels in the specified directory.
  • Create train and test splits using train_test_split.ipynb
  • Modeling scripts are in the 'Modeling' directory.

Contributors

  1. Srishti Singh, srishti20409@iiitd.ac.in
  2. Shreya Bhatia, shreya20542@iiitd.ac.in
  3. Madhava Krishna, madhava20217@iiitd.ac.in
  4. Harshit Goyal, harshit20203@iiitd.ac.in

Motivation

Malaria is a life-threatening disease affecting many people wordwide, spread by infected Anopheles mosquito bites. Earlier studies have shown that the degree of agreement between physicians on the acuteness of the disease in a given patient's sample is very low. Preliminary detection aided by computer systems can be of utmost importance for faster and reliable diagnosis. We aim to create a classifier for paratisized and non-parasitized cells to aid medical professionals in this venture.

Related Work

  • Pan, et al. (2018) created a model based on deep CNN architectures. They were able to obtain accuracies of over 90% on the training and validation samples using data augmentation.
  • Raihan and Nahid (2021) created a model based on boosted trees with feature engineering and determined feature importance using Shapely Additive Explanations (SHAP).
  • Fuhad et al. (2020) implemented a CNN based model with accuracy over 99% while being computationally efficient.

Suggested Outcomes

Automation of the diagnosis process will guarntee accurate diagnosis and, as a result, holds the possibility of providing dependable healthcare to places with limited resources. We aim to implement various algorithms for classification while attempting to find optimal parameters for optimising training time, computational complexity and performance. We will attempt transformations and feature engineering and extraction on the dataset. We are going to apply various machine learning models such as SVMs, logistic regression, decision trees, random forest, and compare the performance of all models. We intend to also attempt grayscale conversion and observe the change in behavior of the models.


Project Proposal

This browser does not support PDFs. Please download the PDF to view it: Download PDF.

About

Exploring image colour space transformations and augmentation for creating a classifier to characterise parasitized and uninfected RBCs. Proposes a CNN model that uses the Saturation of the HSV colour model to create a high quality classifier resulting in accuracies of 99.3% and above.


Languages

Language:Jupyter Notebook 99.0%Language:TeX 0.5%Language:Python 0.3%Language:PureBasic 0.2%