Issue with image_extension when parameter use-hdf5 is used

Question

Issue with image_extension when parameter use-hdf5 is used

aisosalo opened this issue 4 years ago · comments

There is an issue in run_producer with image_extension when use-hdf5 is added as a parameter in run.sh.

Traceback:

Traceback (most recent call last):
  File "src/heatmaps/run_producer.py", line 392, in <module>
    main()
  File "src/heatmaps/run_producer.py", line 388, in main
    produce_heatmaps(model, device, parameters)
  File "src/heatmaps/run_producer.py", line 344, in produce_heatmaps
    making_heatmap_with_large_minibatch_potential(parameters, model, exam_list, device)
  File "src/heatmaps/run_producer.py", line 270, in making_heatmap_with_large_minibatch_potential
    all_patches, all_cases = sample_patches(exam, parameters)
  File "src/heatmaps/run_producer.py", line 223, in sample_patches
    parameters=parameters,
  File "src/heatmaps/run_producer.py", line 240, in sample_patches_single
    parameters,
  File "src/heatmaps/run_producer.py", line 102, in ori_image_prepare
    image = loading.load_image(image_path, view, horizontal_flip)
  File "src/data_loading/loading.py", line 59, in load_image
    image = read_image_mat(image_path)
  File "src/utilities/reading_images.py", line 37, in read_image_mat
    data = h5py.File(file_name, 'r')
  File "env_nyukat/lib/python3.6/site-packages/h5py/_hl/files.py", line 312, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
  File "env_nyukat/lib/python3.6/site-packages/h5py/_hl/files.py", line 142, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 78, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = 'sample_output/cropped_images/0_L_CC.hdf5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

Issue seems to go away by hard-coding here

def get_image_path(short_file_path, parameters):
    """
    Convert short_file_path to full file path
    """
    return os.path.join(parameters['original_image_path'], short_file_path + 'png')

The intention has probably been not to use use-hdf5 parameter at all, but it is listed in run_producer and it does allow the script to be modified to save also in png format (e.g. for visualization purposes) by adding here

saving_images.save_image_as_png(img_as_ubyte(heatmap_malignant), os.path.join(
        parameters['save_heatmap_path'][0], 
        short_file_path + '.png
    ))
saving_images.save_image_as_png(img_as_ubyte(heatmap_benign), os.path.join(
        parameters['save_heatmap_path'][1],
        short_file_path + '.png'
    ))

There is a somewhat similar issue in run_model with image_extension when use-hdf5 is added as a parameter in run.sh.

Traceback:

Traceback (most recent call last):
  File "src/modeling/run_model.py", line 238, in <module>
    main()
  File "src/modeling/run_model.py", line 233, in main
    parameters=parameters,
  File "src/modeling/run_model.py", line 189, in load_run_save
    predictions = run_model(model, device, exam_list, parameters)
  File "src/modeling/run_model.py", line 82, in run_model
    horizontal_flip=datum["horizontal_flip"],
  File "src/data_loading/loading.py", line 59, in load_image
    image = read_image_mat(image_path)
  File "src/utilities/reading_images.py", line 37, in read_image_mat
    data = h5py.File(file_name, 'r')
  File "env_nyukat/lib/python3.6/site-packages/h5py/_hl/files.py", line 312, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
  File "env_nyukat/lib/python3.6/site-packages/h5py/_hl/files.py", line 142, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 78, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = 'sample_output/cropped_images/0_L_CC.hdf5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

Perhaps the safest solution is to hard-code here the correct file extension

loaded_image = loading.load_image(
    image_path=os.path.join(parameters["image_path"], short_file_path + ".png"),
    view=view,
    horizontal_flip=datum["horizontal_flip"],
    )

Jason Phang · Answer 1 · Fri May 22 2020 10:50:24 GMT+0800 (China Standard Time)

Hi @aisosalo, I'm not 100% sure I understand your issue. The purpose of use-hdf5 is to read inputs that are in hdf5 format. We have not currently provided any hdf5 format sample inputs with the repository. It sounds like the functionality you're looking for is writing hdf5 formats instead? Let me know if I'm mischaracterizing your issue.

Antti Isosalo · Answer 2 · Fri May 22 2020 17:04:08 GMT+0800 (China Standard Time)

Thank you for your answer, it resolved my issue. I clearly misunderstood the purpose of the use-hdf5 parameter.

My purpose was to make a script to ease monitoring the heatmap generation using PyCharm:

"""
Method adapted from breast_cancer_classifier function `run_producer` by
Nan Wu, Jason Phang, Jungkyu Park, Yiqiu Shen, Zhe Huang, Masha Zorin,
Stanisław Jastrzębski, Thibault Févry, Joe Katsnelson, Eric Kim, Stacey Wolfson, Ujas Parikh,
Sushma Gaddam, Leng Leng Young Lin, Kara Ho, Joshua D. Weinstein, Beatriu Reig, Yiming Gao,
Hildegard Toth, Kristine Pysarenko, Alana Lewin, Jiyon Lee, Krystal Airola, Eralda Mema,
Stephanie Chung, Esther Hwang, Naziya Samreen, S. Gene Kim, Laura Heacock, Linda Moy,
Kyunghyun Cho, and Krzysztof J. Geras , which is licensed under a GNU Affero General Public License v3.0.
See: https://github.com/nyukat/breast_cancer_classifier/blob/master/LICENSE
"""

import sys
import os
import random

import argparse

from src.heatmaps.run_producer import produce_heatmaps
from src.heatmaps.run_producer import load_model

print(sys.version, sys.platform, sys.executable)


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Generate heatmaps')
    parser.add_argument('--exam-list-path', default='sample_output/exam_list.pkl')
    parser.add_argument('--image-path', default='sample_output/cropped_images')
    parser.add_argument('--output-heatmap-path', default='sample_output/heatmaps')
    parser.add_argument('--model-path', default='models/sample_patch_model.p')
    parser.add_argument('--batch-size', default=100, type=int)
    parser.add_argument('--use-hdf5', choices=[False, True], default=False)
    parser.add_argument('--device-type', choices=['gpu', 'cpu'], default='gpu')
    parser.add_argument('--gpu-number', type=int, default=0)
    parser.add_argument('--seed', default=0, type=int)
    args = parser.parse_args()

    # Set the seed
    random.seed(args.seed)

    params = dict(
        device_type=args.device_type,
        gpu_number=args.gpu_number,
        patch_size=256,
        stride_fixed=70,
        more_patches=5,
        minibatch_size=args.batch_size,
        seed=args.seed,
        initial_parameters=args.model_path,
        input_channels=3,
        number_of_classes=4,
        data_file=args.exam_list_path,
        original_image_path=args.image_path,
        save_heatmap_path=[os.path.join(args.output_heatmap_path, 'heatmap_malignant'),
                           os.path.join(args.output_heatmap_path, 'heatmap_benign')],
        heatmap_type=[0, 1],
        use_hdf5=args.use_hdf5  # when using hdf5 format sample inputs
    )

    # Get model
    model, device = load_model(params)

    # Generate heatmaps in the chosen format
    produce_heatmaps(model, device, params)

Antti Isosalo · Answer 3 · Fri May 22 2020 17:04:47 GMT+0800 (China Standard Time)

What might be the possible benefits of using hdf5 format mammogram images as an input to the network? Is it something to consider when fine-tuning the pre-trained models for a different dataset?

Krzysztof J. Geras · Answer 4 · Fri May 29 2020 05:47:44 GMT+0800 (China Standard Time)

It shouldn't matter which format you use as long as you correctly replace the part of the code which is loading the images. We chose that format simply because it was the fastest to load when we tested it on our cluster.