s-wheels/classical_cv_cell_segmentation

Running The Code

All the code is in the cell_segmentation.ipynb notebook.

Please use pip to install from the requirements file and download the data and add to the data directory in the following structure:

classical_cv_cell_segmentation
│   README.md
│   cell_segmentation.ipynb - Jupyter Notebook containing cell segmentation code 
│   requirements.txt
│
└───data
    │
    └───BBBC020_v1_images
    │   │ folders containing cell images
    │
    └───BBC020_v1_outlines_cells
    │   │ cell masks
    │
    └───BBC020_v1_outlines_nuclei
        │ nuclei masks

Classical CV Segmentation Approach

I tried 4 different methods directly implementable with OpenCV.

Method	Mean IoU
Binary Threshold	0.4235
Otsu Threshold	0.4439
Gaussian Otsu Threshold	0.4439
Adaptive Threshold	0.2914

Deep Learning Segmentation Approach

Model & Experiment

Investigation
- Investigate SOTA approaches that best fit our needs (computational resources, inference time, performance etc). I would use Papers With Code as a first stop to investigate approaches that would be easily implementable.
Training Model
- After choosing the model, I would ideally use a pretrained model (from HuggingFace, git etc) to start. Pretraining improves performance and reduces training time. I would use PyTorch and augmentation libraries to implement a dataset class for shuffling/augmenting data. This dataset class could be used with either HuggingFace or PyTorch. I would utilise the already developed code for producing inputs (cell images) and the targets (cell masks).
- Code: I would refactor the code into a sensible directory structure and after the initial experimentation I would utilise python scripts not jupyter notebooks. I would follow standard PEP8 style and I like to use tools such as Black & isort to automate the formatting of my code.
Tracking
- I would use git for tracking of the code and github for remote storage. I would use Weights & Biases for tracking of: data versioning, model versioning, git commit, environment, experimental results & hyperparameters. I would also use Docker for reproducibility (this can also be used with Weights & Biases).
Evaluation
- Assess performance of model against KPIs (IoU etc) and against the required level of performance to generate value. Use tools such as Weights & Biases to track learning curves & other graphs.
Execution & Pipeline
- If we wanted to build a pipeline I may utilise an orchestration tool such as Airflow which is open source and can integrate with most cloud providers. I would use Airflow to automate management of the previous stages, roughly broken out as follows:
  - Data Ingestion.
  - Data Preprocessing/Transformation.
  - Model Training/Inference.
  - Model Evaluation.
Model Deployment
- The execution environment would depend on the requirements of the project. Ideally execution would happen on a server or in the cloud with more powerful resources. A CI/CD workflow for this project:
  - Source: push the code updates.
  - Build: build the system.
  - Test: Creation of robust tests, for ML dev I would consider the specific unit tests, integration tests and data/model tests to create a reliable CI/CD workflow. Data tests would validate the data's properties and model tests would ensure model convergence/fitting as well as any other tests. Previously I have used 'unit tests' for model testing, where we utilised extremely difficult edge cases to test model performance.
  - Deploy: Deployment to cloud/server.
Model Monitoring - Monitor deployed model for performance degradation and the data for concept/domain drift.

Tools

HuggingFace - pretrained models & training.
PyTorch - data handling & model training.
Weights & Biases - tracking model/data/experiment.
Git & Github - version control.
Black & isort - formatting code.
Docker - environment control.
Airflow - orchestration of pipeline.

s-wheels / classical_cv_cell_segmentation

Running The Code

Classical CV Segmentation Approach

Deep Learning Segmentation Approach

Model & Experiment

Tools

About

Languages