isi-mube / cytology-codex

Computer Vision application to diagnose diverse Cytology samples using medical imaging Data from a virtual microscope. Also, Ironhack's final-bootcamp project.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cytology Codex

The Digital Tome of Cytology: Deep Learning & Neural Networks Transmutation


Behold, the Digital Tome of Cytology, revealing microscopic marvels and cellular untold tales.

  1. Chapter I: Salivary gland specimens:
  2. Chapter II: Gynecological specimens:
  3. Chapter III: Thyroid specimens:
  4. Chapter IV: Effussions specimens:

About the Project

This project started on 08/07/2023 and was completed within 3 weeks, presented on 27/07/2023 for Ironhack´s final bootcamp project.

Primary Objective:

  • Develop multiple-multiclass classification models capable of diagnosing cytological image samples from diverse locations, including salivary glands, gynecological, thyroid and effussions.

Secondary Objectives:

  • Implement a web-based application using Streamlit that enables users to predict diagnoses based on their image-inputs.
  • Provide informative feedback on the image features using ChatGPT API.

Agile methodology and roadmap --> click me

About Cytology

Glossary

Let´s define first a few key terms:

  • Cytology: This is the study of individual cells to detect abnormalities, including cancer. It's a type of sample method that provides a less invasive alternative to biopsies, enabling early diagnosis and treatment initiation, and improving health outcomes.
  • Cytopathology: A specialized field, nested in pathology, that looks at diseases on the cellular level. Professionals of cytopathology include Cytotechnologists & Cytopathologists, focusing on screening, interpretation, and diagnosis of diverse cell samples.
  • Digital Pathology: This involves digitizing pathology slides, allowing the use of image-based information for diagnosis, research, and teaching. Digital Pathology includes not only the digitalization of histology and cytology slides but also the automatization, technology, and tools of all preanalytical, analytical and post-analytical processes in a pathology department

Challenges in Digital Cytology

In digital cytology, we face a unique challenge. Unlike in histology, where cells maintain their flat structure (like a single layer of bricks on a wall), cytology samples can be more like a pile of bricks dumped out of a bucket. These cells in suspension no longer hold their original formation, making diagnosis more complex and time-consuming because it requires mastery of pattern recognition. Furthermore, due to these additional dimensions, digitizing these cell images requires even more storage space.



Thyroid, papillar carcinoma. Same tumor, different methods and different features. On the left, histology (1-dimensional thin layer), and on the right, cytology (three-dimensional in suspension cells).

Personal Journey and Perspectives on Cytology

My past 5 years of work have been all around Cytology; it involved screening and diagnosis of numerous cytology specimens, quality control, and engaging in both teaching and research, including Digital Pathology publications.

One significant barrier to the digitalization of Cytological samples is the final size. As previously explained, the cells in Cytology are not flat, unlike in Histology, but three-dimensional. This complexity typically requires a Z-stack scanning of the slides to capture all focal points, resulting in large digital files.

Despite this challenge, I firmly believe that Machine Learning and Deep Learning models can be implemented in Cytology images, bypassing the need for a complete scan, hence one of the most challenging aspects of the digitalization process.

Results and Conclusions

  1. The convolutional neural network (CNN) model demonstrated excellent accuracy in the multiple-multiclass classification of cytology images, with a performance metric of approximately 90-95% accuracy around the 20-25 epoch mark.
  2. The challenge of the lack of available Data was addressed through the synthetic generation of new cytology images, an approach known as data augmentation. This technique was crucial for minimizing false negatives across all diagnostic categories.
  3. This model has the potential for real-world implementation, opening the door for the creation of AI algorithms using single-layer cytological slide scans or even phone-captured images, thereby challenging the need for full slide multi-layer scanning with z-stack, a process that is both costly and time-consuming.

For specific metric results, please refer to the specific Python folder:

Toolkit:

  • JupyterLab: Enviorment for Python scripts and managing files. AKA, as if VSCode and JupyterNotebook had a kid.

Libraries

📚 Basic Libraries

  • Pandas: Data manipulation and analysis.
  • Numpy: Arrays and mathematical functions, allowing it to read images.
  • Os: File managment.
  • Matplotlib: 2D Data visualization.
  • Seaborn: Runs on top of matplotlib, HD data visualization.
  • PIL: Python Imaging Library to manipulate images.

🛠️ Tools

  • Warnings: Roses are red, violets are blue --> Warnings are annoying.
  • Shutil: File operations (copying, deleting...).
  • Random: To generate random subsets of data.

🌐 Computer Vision

  • TensorFlow: Machine Learning for Computer Vision.
  • Keras: High-level neural networks API for Deep Learning, running on top of TensorFlow.
  • ImageDataGenerator: To generate random data augmentation (flips, zoom...).

📈 Metrics and Reports

  • Sklearn: Machine Learning metrics.
  • Confusion Matrix: To evaluate true and false positives and negatives.
  • Confusion Matrix Display: To easily display the matrix.
  • Classification Report: For a more accurate detail of each metrics (precision, recall, f1-score, support).

Bibliography:

Acknowledgments:

  • Xisca: Endless source of wisdom and inspiration. Your faith in me pushed my boundaries, driving me beyond what I believed was possible to accomplish.
  • Sabina: Your knowledge on Computer Vision sparked my curiosity.
  • Laz: For your emotional support during the bootcamp and amazing coding-feedbacks.
  • Camille: Your sharp eyes and Python tricks helped my learning.
  • Xose: Simply, my life saver.
  • My classmates. Specially:
    • Nicole, you help me go through dark times !
    • Nati, your moral support has been a godsend.
    • Luis, Luisi forever. You are a constant motivation to excel (pun intended).
    • Evangelos [...] time is an illusion that helps things make sense, so we're always living in the present tense...

About

Computer Vision application to diagnose diverse Cytology samples using medical imaging Data from a virtual microscope. Also, Ironhack's final-bootcamp project.


Languages

Language:Jupyter Notebook 96.8%Language:Python 3.2%