Question / clarification regarding heap allocations

Question

Question / clarification regarding heap allocations

emchristiansen opened this issue 8 months ago · comments

Eric Christiansen commented 8 months ago

Asking here because I didn't see this explicitly covered in the docs.

In what conditions can a user know that a given piece of dfdx code won't perform any heap allocations?
E.g., will I be good if i simply ensure all my tensors have static shapes?

I don't have a perfect apples-to-apples comparison, but I have the impression my dfdx code is maybe 2x to 4x slower than equivalent code written in JAX and compiled to XLA.
In the XLA version, the full data flow graph including the tensor shapes is statically known, so there's no need for dynamic allocation, and I'm wondering if that might be the source of the apparent difference in speed.

Corey Lowman · Answer 1 · Mon Nov 06 2023 22:00:55 GMT+0800 (China Standard Time)

Yeah it's likely due to allocations - the only thing dfdx does for reducing allocations is for unary/binary operations where one of the inputs only has 1 owner & the output has the same shape/strides as the input. In this case dfdx will reuse the input allocation for the output. But if the input has multiple owners or a different shape it can't do that. In every other case there will be an allocation

Corey Lowman · Answer 2 · Mon Nov 06 2023 22:01:23 GMT+0800 (China Standard Time)

If you want to add details about this to the docs feel free to open a PR! Probably good at the mod level tensor documentation