- Is sum factorization by hand really faster than optimizations by the compiler?
- Does hand optimization obstruct compiler optimization?
- How does vectorization play in?
- Plain loops vs. recursion by dimension?
- Does it make a difference if we use
u[][][]
or the flattened versionu[]
where we compute indices by hand? - How do the answers depend on the number of shape functions/quadrature points?
g++ -Ofast -funroll-loops --param max-completely-peeled-insns=10000 --param max-completely-peel-times=10000
-fopt-info-optimized -fopt-info-missed