Low Memory Mode for num_mips=1

Question

Low Memory Mode for num_mips=1

william-silversmith opened this issue 4 years ago · comments

William Silversmith commented 4 years ago

@nkemnitz tried to downsample a 64 GB image but was only attempting to generate a single mip level. This required 144 GB due to the extraneous bookkeeping required to prevent integer truncation at mip 2+. Why not have a very fast, low memory version for only mip 1?

Nico Kemnitz · Answer 1 · Mon Aug 31 2020 11:38:40 GMT+0800 (China Standard Time)

Scenario was:

4096x4096x4096 uint8 volume
factor=[2,2,2]
num_mips=1
sparse=True

Would it affect correctness if tinybrain chunks the volume into slightly smaller blocks? E.g. each of x*2^num_mips edge length, process them individually and assemble the downsampled version from those? With that edge length, mode and average should produce the same result as doing the whole volume in one go, I think,

Advantages:

no surprises for users / predictable speed and memory usage for all num_mips (With the exception of extreme cases where the resulting lowest-resolution chunk would have dimensions of 1x1x1)
could think about multiprocessing

Disadvantages:

In theory it should be slower (single-threaded), but when I quickly hacked this together to fit it on the 128GB machine, I tested it on a even fatter node and didn't see any noticable difference in speed when chopping it into 512 blocks. Both runs took approximately 220 s. But the 512 blocks version only needs 64 GB (input) + 8 GB (output) + ~1 GB (single chunk+bookkeeping)

William Silversmith · Answer 2 · Mon Aug 31 2020 12:16:56 GMT+0800 (China Standard Time)

It would be fine so long as the sub-blocks are the size of the area that will be rendered into the lowest resolution mip level. For many cases, this will be the size of the entire block, which is usually how Igneous picks the task shape. I think one mip level is a special case that could be used quite frequently and also happens to be the most expensive mip level to generate. If we special case num_mips=1, you can in all likelihood generate the others using the regular logic without a problem.

Nico Kemnitz · Answer 3 · Mon Aug 31 2020 12:55:18 GMT+0800 (China Standard Time)

It would be fine so long as the sub-blocks are the size of the area that will be rendered into the lowest resolution mip level.

Not sure I understand: Imo, the sub-blocks would not need to cover the final levels chunk_size, they just need to be large enough to cover a single final pixel.
For igneous-typical scenarios, factor=(2,2,1), num_mips=4, that would mean your sub_blocks just need to be 16x16x1 voxel large.

William Silversmith · Answer 4 · Mon Aug 31 2020 14:08:56 GMT+0800 (China Standard Time)

That's a good point. Doing it in such small blocks would result in some slowdown as you'd have to jump around the arrays more. You'd also need to maintain and render to all the output mips at once, which requires tighter coordination of different parts of the algorithm. The major annoyance there would probably be the refactoring.