siemdejong / mscthesis

MSc Biophysics thesis on applying deep learning on HHG images for regression and pathology

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CC BY 4.0


Deep learning on higher harmonic generation images for regression and pathology

Siem de Jong
MSc thesis
View latest build »

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Checklist
  5. License
  6. Contact
  7. Acknowledgments

About The Project

This repository contains the source code of the MSc thesis of Siem de Jong's. Research is conducted in the context of deep learning on higher harmonics generation imaging at the University of Amsterdam and Vrije Universiteit Amsterdam in the Biomedical Imaging and Photonics group.

(back to top)

Built With

LaTeX

(back to top)

Getting Started

Prerequisites

A LaTeX preprocessor is required to build pdfs from source. This repo is tested with on Windows with MiKTeX and Linux with TeX Live.

Installation

  1. Clone the repo
    git clone --recurse-submodules https://github.com/siemdejong/mscthesis.git
    The recurse-submodules flag is needed to download the custom kaobook style.
  2. Install the local TEXMF directory as a TEXMF root directory to install kaobook. For MiKTeX on Windows, in the project directory, run
    install_windows.bat
    For TeX Live on a UNIX system, run
    ./install_linux.sh
    Kaobook can also be installed manually by following the instructions for the installed TeX distribution.

(back to top)

Usage

To compile the output, run

pdflatex
pdflatex
biber
pdflatex

with your preferred optional arguments.

(back to top)

Checklist

As the thesis aims to report on two diagnostic prediction models for development and validation, a TRIPOD-AI-like checklist is followed. See below. Yet to be adapted to this study.

  • Title page

  • Abstract

  • General introduction

    • Link skin and brain project
    • Mention TRIPOD-AI
  • Theoretical background of convolutional neural networks

  • Skinstression

    • Abstract
    • Introduction
      • Background (diagnostic + rationale for dev/val + purpose)
      • Objectives (development + validation)
    • Methods
      • Sources of data
        • source of data of training/val/test
        • origin of data
        • dates of data collection
      • Participants (study setting + eligibility + no specific treatment)
        • study setting: tertiary care, VUmc
        • eligibility for participants or data sources
        • treatment received
      • Data preparation
        • stress-strain curves
        • images
        • data augmentation
      • Outcome of model
        • What is predicted?
        • How is prediction assessed?
        • (Why choosing this outcome measurement if alternatives exist?)
      • Predictors
        • Alternatives for predictors
        • three parameters + how they are measured
        • source of predictors + known biases
      • Sample size
      • Missing data
        • sex and age
      • Statistical analysis methods
        • Diagram of analytical process
        • handling of predictors
        • Pre-selection of predictors prior to model building (results for exp/pca/logistic)
        • rescaling/transformation on predictors (LDS + reweighting)
        • type of model, building model + predictor selection + internal validation
        • model ensembling techniques (if used)
        • detailed model description
        • initialization of model parameters
        • training approaches (hyperparameters, number of models trained, used datasets)
        • Measures to assess model performance + model comparison
        • model updating arising from validation
        • how final model is selected
        • explainability and interpretability
        • software used
    • Results
      • Participants (flow, demographics, comparison train/val/test (predictor distributions and images))
      • Model dev and per participant outcome in
        • Hyperparameter tuning
        • Training
        • Testing
      • Model specification (present model + explain how it must be used)
      • Model performance
        • accuracy WITH confidence interval
        • results of analysis on performance errors
      • Model updating (performance per update)
      • Usability
        • how and when in the clinical pathway to use the prediction AI
        • how will the AI be integrated into the target setting + requirements (on-/offsite)
        • how will poor data be assessed when implementing AI model
        • any human interaction needed for data to be used with the model + expertise of users
      • Sensitivity analysis?
    • Discussion
      • Limitations
      • Interpretation (dev/val data performance + overall interpretation considering objectives/limitations/similar study results/other evidence)
      • Implications
        • potential use (also in a general way)
        • how will clinical practice be different if using the AI and how will it be used
    • Supplementary information
      • Data?
      • Code
    • Funding?
    • References
  • Pediatric brain tumors

    • Abstract
    • Introduction
      • Background (diagnostic + rationale for dev/val + purpose)
      • Objectives (development + validation)
    • Theory
      • Feature extraction
      • MIL
        • Classical
        • DeepMIL
        • VarMIL
      • Model performance
        • ROC Curve
        • PR Curve
        • PRG Curve
        • IoU
    • Methods
      • Sources of data
        • source of data of training/val/test
        • origin of data
        • dates of data collection
      • Participants (study setting + eligibility + no specific treatment)
        • study setting: tertiary care, Princess maxima center
        • eligibility for participants or data sources
        • treatment received
      • Data preparation
        • targets (from text to numbers)
        • images
          • getting images from raw data
          • scaling overview images
          • masking
          • tiling
          • (optionally) denoising
          • ...
        • data augmentation
      • Masking (mini study)
      • Outcome of model
        • What is predicted?
        • How is prediction assessed?
        • (Why choosing this outcome measurement if alternatives exist?)
      • Predictors
        • Alternatives for predictors
          • pathologist decision
          • genetic marker
        • how does pathologist make decision?
        • source of predictors + known biases
          • age
          • location
          • ...
      • Sample size
      • Missing data
      • Statistical analysis methods
        • Diagram of analytical process
        • handling of predictors
        • Pre-selection of predictors prior to model building
        • rescaling/transformation on predictors
        • type of model, building model + predictor selection + internal validation
        • detailed model description
        • initialization of model parameters
          • simclr pretrain
          • imagenet
        • training approaches (hyperparameters, number of models trained, used datasets)
          • hyperparameters trained on one split
          • 5 splits, 5 models
        • Measures to assess model performance + model comparison
          • AUPR
          • AUPRG
          • simclr init vs imagenet init vs ...
        • model updating arising from validation
        • how final model is selected
          • best F1 per split
        • explainability and interpretability
          • multiply attention vector with input tiles
        • software used
          • Ray
          • Optuna
          • Pytorch (Lightning)
        • setup used
    • Results
      • Participants (flow, demographics, comparison train/val/test (predictor distributions and images))
      • Model specification (present model + explain how it must be used)
      • Model performance
        • AUPRG WITH confidence interval over splits
        • results of analysis on performance errors
        • Attention maps
        • Loss curves
        • nearest neighbours simclr
        • tsne simclr
      • Usability
        • how and when in the clinical pathway to use the prediction AI
        • how will the AI be integrated into the target setting + requirements (on-/offsite)
        • how will poor data be assessed when implementing AI model
        • any human interaction needed for data to be used with the model + expertise of users
    • Discussion
      • Limitations
        • bad data? noise exclusion
        • overfitting fold 1
      • Interpretation (dev/val data performance + overall interpretation considering objectives/limitations/similar study results/other evidence)
      • Implications
        • potential use (also in a general way)
        • how will clinical practice be different if using the AI and how will it be used
    • Supplementary information
      • Data?
      • Code
    • References
  • Discussion and conclusion

    • Discussion
    • Conclusion
  • All references

  • Acknowledgments

See the open issues for a list of discussions (and known issues).

(back to top)

Diagrams

Diagrams are made with Mermaid (mermaid.cli) and PlantUML. Their outputs are already compiled.

Run

mmdc -i mermaid/input.mmd -o mermaid/output.pdf -f

to compile Mermaid diagrams, and run

java -jar plantuml.jar input.puml

to compile PlantUML diagrams. Move the diagrams to the mermaid or plantuml folder and import the pdf/svg with includegraphics/includesvg.

(back to top)

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

(back to top)

Contact

Siem de Jong - siemdejong

Skinstression: siemdejong/shg-strain-stress

(back to top)

About

MSc Biophysics thesis on applying deep learning on HHG images for regression and pathology

License:Creative Commons Attribution 4.0 International


Languages

Language:TeX 97.5%Language:Jupyter Notebook 1.5%Language:Mermaid 0.9%Language:Batchfile 0.1%Language:Shell 0.0%