BMIRDS / deepslide

Code for the Nature Scientific Reports paper "Pathologist-level classification of histologic patterns on resected lung adenocarcinoma slides with deep neural networks." A sliding window framework for classification of high resolution whole-slide images, often microscopy or histopathology images.

Home Page:https://www.nature.com/articles/s41598-019-40041-7

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parallelize and pipeline preprocessing scripts

jlevy44 opened this issue · comments

Hey there! Awesome package. I was thinking that for some researchers with access to a large amount of slides, some of the tiling / patch creation will take a while if done in series. Just tiling ~150 slides may take a few days, and adding patches even more; preprocessing on the order of one week just for splitting the images is a bit extreme. I recommend adding the option to parallelize most of the preprocessing scripts as well as automate them into a preprocessing pipeline for deployment (I'm sure other groups could build their own internal pipeline).

Could be useful as these datasets become larger.

I'd be happy to help PR.

It's probably more on the order of half a week, but this seems that it should be a process that should take a few hours at the most.

Joshua, that's a great suggestion. I will say that for me, tiling 100 slides takes on the order of ten minutes. If you're running this code for days, it seems to me like (1) your images are too high resolution or (2) you're overlapping windows way too much. I feel like if your code is continuously generating patches for a few days, your server will probably run out of space...

Yeah, seems like patch generation was pretty fast though.

Slides were a few Gb each before tiling. Tiles were substantially smaller and easier to process.

Still would be a nice PR, workflow automation. Well appreciated by many.

There's this really smart PhD student at Dartmouth, good guy and lifts a lot of weight. He made something called PathFlowAI to address this, feel free to look into it: https://pypi.org/project/pathflowai/ .

Never heard of it before. I'll have to check it out. Thanks @jasonwei20 ! Also found this during the search: https://github.com/jlevy44/PathFlowAI

;)

Cheers!