Cache Aware Single Core Version
william-silversmith opened this issue · comments
William Silversmith commented
In testing, it seems that about 69% of the time my test volume was bound by memory latency. This would seem to indicate that a cache aware version could be ~2-3x faster.
William Silversmith commented
Not clear if this is possible. You might just end up exchanging latency on the core data for latency in the calculation on range and vertex.
William Silversmith commented
Might be able to make use of __builtin_prefetch
for the Z axis (for g++ and clang).