devitocodes / opesci-fd

A framework for automatically generating finite difference models from a high-level description of the model equations.

Home Page:http://opesci.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Optimizations techniques for code generation

felippezacarias opened this issue · comments

Pull Request Why Reference Code parameters used Time Before/After – Xeon Time Before/After – Xeon Phi
#51 Thread blocking access would be achieved by the directive  schedule(static,1) on the outer most loop. It allows threads processing the z plane use some y and x planes already in cache. Wave Equation Based Stencil Optimizations on Multi-core CPU - Muhong Zhou and William W. Symes, Rice University – Section: Reducing L3 Cache Misses – Blocking thread accesses Xeon: Code 8th order, Grid size 512x512x512 Xeon Phi: Code 8th order, Grid size 420x420x420 288 sec - 258 sec 123 sec - 112 sec
#52 Modifies the array access pattern by fission on the inner most loop and rearranging the access pattern by its stride. Beyond that, this changes helps to reduce register pressure on the vectorization. Borges, L., 2011, 3d finite differences on multi-core processors. (available online at [https://software.intel. com/en-us/articles/3d-finite-differences-on-multi-core-processors](https://software.intel. com/en-us/articles/3d-finite-differences-on-multi-core-processors)). Xeon: Code 8th order, Grid size 512x512x512 Xeon Phi: Code 8th order, Grid size 420x420x420 258 sec - 158 sec 112 sec - 196 sec