alif-munim/dermatoscopic-vae

Training VAE

Make sure ham dataset is downloaded and extracted
Make sure all packages are installed (can check train_vae.py to see which or run the code and install the ones that show in the error message)
Assuming ham dataset is stored as ham on working directory and we want to train it on the mel class, we run python train_vae.py ham/mel --no_cache (having issues with cache)

usage: train_vae.py [-h] [-bs BATCH_SIZE] [-hp HISTORY_PATH] [-e EPOCHS] [-xc EXPERIMENT_CHECKPOINT] [-nw NUM_WORKERS] [-c] [-nc] [-cf [CACHE_FILE]] ham_dataset_dir

positional arguments:
  ham_dataset_dir       Path to ham dataset's category ex: `ham/mel`.

optional arguments:
  -h, --help            show this help message and exit
  -bs BATCH_SIZE, --batch_size BATCH_SIZE
                        Batch size for the model (default - 32).
  -hp HISTORY_PATH, --history_path HISTORY_PATH
                        Path to where to store history in json file. Ex: checkpoints/vae/history.json (default)
  -e EPOCHS, --epochs EPOCHS
                        Number of epochs to train for (default - 150)
  -xc EXPERIMENT_CHECKPOINT, --experiment_checkpoint EXPERIMENT_CHECKPOINT
                        Checkpoint to save after done training ex: (default) checkpoints/vae/vae -> checkpoints/vae/vae.index, checkpoints/vae/vae.data-00000-of-00001,
                        checkpoints/vae/checkpoint.
  -nw NUM_WORKERS, --num_workers NUM_WORKERS
                        Maximum number of processes to spin up when using process-based threading (default - number of cores [multiprocessing.cpu_count()]).
  -c, --cache           Caching of training data is enabled.
  -nc, --no_cache       Caching of training data is disabled (default).
  -cf [CACHE_FILE], --cache_file [CACHE_FILE]
                        File location for where to cache. Ex: /tmp/cache. If caching is enabled but directory is not provided, will cache in memory (default).

Testing VAE

Make sure to train VAE and locate checkpoint file before testing.
Run python test_vae.py checkpoints/vae/vae -hdd="ham/mel" if trained on mel class and checkpoints were saved to checkpoints/vae/vae.

usage: test_vae.py [-h] -hdd HAM_DATASET_DIR [-bs BATCH_SIZE] model_weights

positional arguments:
  model_weights         Checkpoint to load the weights from. ex: checkpoints/vae/vae.data-00000-of-00001 then input checkpoints/vae/vae

optional arguments:
  -h, --help            show this help message and exit
  -hdd HAM_DATASET_DIR, --ham_dataset_dir HAM_DATASET_DIR
                        Dataset to test VAE on (only 1 class) ex: ham/mel
  -bs BATCH_SIZE, --batch_size BATCH_SIZE
                        Number of images to reconstruct

Plotting Metrics

Make sure to train VAE and locate history.json file before plotting metrics.
Run python plot_metrics.py checkpoints/vae/history.json -dfk=2. -dfk removes first k epochs from the plot as the losses are usually very high and skew the plot.

usage: plot_metrics.py [-h] [-dfk DROP_FIRST_K_EPOCHS] history_path

positional arguments:
  history_path          Checkpoint to load the weights from. ex: checkpoints/vae/vae.data-00000-of-00001 then input checkpoints/vae/vae

optional arguments:
  -h, --help            show this help message and exit
  -dfk DROP_FIRST_K_EPOCHS, --drop_first_k_epochs DROP_FIRST_K_EPOCHS
                        Number of epochs of metrics to drop from the start (since first few losses are extremely high, it can skew the graph) Ex: 2 (default).

Generating Synthetic Data

Make sure to train VAE and locate checkpoint file before generating synthetic data.
Run python generate_synthetic_data.py checkpoints/vae/vae -hdd="ham/mel" -nm=100 to generate 100 synthetic data points with a default epsilon of 0.2

usage: generate_synthetic_data.py [-h] -hdd HAM_DATASET_DIR [-sd SAVE_DIR] [-e EPSILON] [-nm NUM_IMAGES] model_weights

positional arguments:
  model_weights         Checkpoint to load the weights from. ex: checkpoints/vae/vae.data-00000-of-00001 then input checkpoints/vae/vae

optional arguments:
  -h, --help            show this help message and exit
  -hdd HAM_DATASET_DIR, --ham_dataset_dir HAM_DATASET_DIR
                        Dataset to test VAE on (only 1 class) ex: ham/mel
  -sd SAVE_DIR, --save_dir SAVE_DIR
                        Directory to store the generated data
  -e EPSILON, --epsilon EPSILON
                        Epsilon value for generating noise on latent space from standard normal ex: 0.2 (default)
  -nm NUM_IMAGES, --num_images NUM_IMAGES
                        Number of images to reconstruct

Training Classifier

Open up the ham_classifier.ipynb file in jupyter notebook. Make sure the ham_dataset_dir is pointing to the a directory that contains subdirectories for the four classes; 'bcc', 'bkl', 'mel', and 'nv'.
Ensure that the 'mel_source_dir' is pointing to a directory that contains subdirectories labelled 'mel_true' and 'gen_data' which contain the real and synthetic images respectively.
Specify the number of real images and synthetic images to be used for training using the 'mel_test_num' and 'mel_gen_num' variables. Neither value should be greater than the number of images contained in their respective 'mel_source_dir' directories.
Create the '/plots' '/reports' and '/history' directory so that plots, reports, and data from training and evaluation can be saved for future reference.
Run all cells.

alif-munim / dermatoscopic-vae

Training VAE

Testing VAE

Plotting Metrics

Generating Synthetic Data

Training Classifier

About

Languages