seung-lab / tinybrain

Image pyramid generation for grayscale and segmentation image resize.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Low Memory Mode for num_mips=1

william-silversmith opened this issue · comments

@nkemnitz tried to downsample a 64 GB image but was only attempting to generate a single mip level. This required 144 GB due to the extraneous bookkeeping required to prevent integer truncation at mip 2+. Why not have a very fast, low memory version for only mip 1?

Scenario was:

  • 4096x4096x4096 uint8 volume
  • factor=[2,2,2]
  • num_mips=1
  • sparse=True

Would it affect correctness if tinybrain chunks the volume into slightly smaller blocks? E.g. each of x*2^num_mips edge length, process them individually and assemble the downsampled version from those? With that edge length, mode and average should produce the same result as doing the whole volume in one go, I think,

Advantages:

  • no surprises for users / predictable speed and memory usage for all num_mips (With the exception of extreme cases where the resulting lowest-resolution chunk would have dimensions of 1x1x1)
  • could think about multiprocessing

Disadvantages:

  • In theory it should be slower (single-threaded), but when I quickly hacked this together to fit it on the 128GB machine, I tested it on a even fatter node and didn't see any noticable difference in speed when chopping it into 512 blocks. Both runs took approximately 220 s. But the 512 blocks version only needs 64 GB (input) + 8 GB (output) + ~1 GB (single chunk+bookkeeping)

It would be fine so long as the sub-blocks are the size of the area that will be rendered into the lowest resolution mip level. For many cases, this will be the size of the entire block, which is usually how Igneous picks the task shape. I think one mip level is a special case that could be used quite frequently and also happens to be the most expensive mip level to generate. If we special case num_mips=1, you can in all likelihood generate the others using the regular logic without a problem.

It would be fine so long as the sub-blocks are the size of the area that will be rendered into the lowest resolution mip level.

Not sure I understand: Imo, the sub-blocks would not need to cover the final levels chunk_size, they just need to be large enough to cover a single final pixel.
For igneous-typical scenarios, factor=(2,2,1), num_mips=4, that would mean your sub_blocks just need to be 16x16x1 voxel large.

That's a good point. Doing it in such small blocks would result in some slowdown as you'd have to jump around the arrays more. You'd also need to maintain and render to all the output mips at once, which requires tighter coordination of different parts of the algorithm. The major annoyance there would probably be the refactoring.