PyDmed (Python Dataloader for Medical Imaging)

The loading speed of hard drives is well below the processing speed of modern GPUs. This is problematic for machine learning algorithms, specially for medical imaging datasets with large instances.

For example, consider the following case: we have a dataset containing 500 whole-slide-images (WSIs) each of which are approximately 100000x100000. We want the dataloader to repeatedly do the following steps:

randomly select one of those huge images (i.e., WSIs).
crop and return a random 224x224 patch from the huge image.

PyDmed solves this issue.

How It Works?

The following two classes are pretty much the whole API of PyDmed.

BigChunk: a relatively big chunk from a patient. It can be, e.g., a 5000x5000 patch from a huge whole-slide-image.
SmallChunk: a small data chunk collected from a big chunk. It can be, e.g., a 224x224 patch cropped from a 5000x5000 big chunk. In the below figure, SmallChunks are the blue small patches.

The below figure illustrates the idea of PyDmed. As long as some BigChunks are loaded into RAM, we can quickly collect some SmallChunks and pass them to GPU(s). As illustrated below, BigChunks are loaded/replaced from disk time to time.

yuchen2580 / PyDmed

PyDmed (Python Dataloader for Medical Imaging)

How It Works?

About

Languages