devitocodes / opesci-fd

A framework for automatically generating finite difference models from a high-level description of the model equations.

Home Page:http://opesci.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

General Case

mlange05 opened this issue · comments

From @navjotk on March 10, 2016 21:16

@ggorman @mlange05 I have some doubts about how the general case might look for running this sort of analysis using opesci.

  1. True Velocity: Here a numpy array is handcrafted to be passed to the Forward function later. If we allow for any matrix to be provided as input, generating C code to initialise this matrix in the C program might make the generated code too messy. The other option might be to pass the matrix from python to C in-memory (Is this what you mentioned @mlange05 ?). This imposes the limitation that the C code is no longer stand-alone and always needs to be called using a python wrapper. Another option would be to let the user provide the initialisation matrix as a mathematical function in x and y. This way we could just pass that function as a kernel and keep things simple software engineering wise. The matrix used here could possibly be represented as a function in x and y using Dirac-Delta or Heaviside functions, for example.
  2. Source: The Ricker wavelet source is provided here as a python function. Converting an arbitrary python function to generated C code would be a tough challenge. However in the case of sources, we could probably maintain a library of sources from which the user can can choose a source.
  3. Damping function: While a similar approach as above could be taken here as well, I see that the damping function has changed between the two examples posted here (Visco-Acoustic and Acoustic). Would a library-of-damping-functions approach work here?

Edit: After posting it I realise that this question is more related to the Opesci-fd repository than this one. Remove and repost?

Copied from original issue: devitocodes/notebooks#7

From @ggorman on March 10, 2016 22:25

Its not clear at this point what you mean by "this kind of analysis"

  1. But as we are generating code why cannot we just generate bespoke code for a given API. For now, focusing on a python API is not a bad thing as the outer loop is being written in python.
  2. In general, the wavelet (source) is also read from a file into an array. For tests the python function could create a numpy array for the time series and pass this into the propagator.
  3. Can we not just generate a kernel that gets passed in as an argument to the function?

Yes - relocate..

Ok, I think completely switching to numpy arrays would invalidate any previous benchmarking results and possibly impact performance detrimentally, since one of our main tricks is that we own the allocation and enforce alignment. That being said I would very much like to be able to interface with native numpy data structures, since it would solve most of the points raised by @navjotk above. This is why I advocate for the introduction of some form of opaque data structures that can:

  • Act as an ultra-thin wrapper around numpy arrays, allowing us to use any validated pure Python data from the notebooks
  • Manage allocation features (alignment, first touch, maybe later even custom data layouts) and record allocation details, so that we can determine the applicable loop traversal and possible optimisations (pragmas) when generating the main loops
  • Manage file input and the necessary data transformations into our aligned data structures

Another side to this is that the current code, since it completely absorbs any data handling by writing everything into a single .cpp file, is very limited in the way it interacts with the enclosing Python world. This is due to the previous requirement of having to generate standalone compilable propagator models, and I do believe this still holds (@ggorman does it?). Now that we want to interact with numpy directly we should clarify what the Python interface looks like, while maintaining the standalone option.

I am not sure I got everything but here are a few things

1 . True velocity cannot be a function, it is usually a user provided array that will be updated and modify in the outer loop, so I think it has to be passed from python

2 . The same way as the velocity, it usually user provided as a file (for real data)

3 . The current boundary condition in my notebooks are really bad, It is just there to show it doesn't have to break the workflow (adjoint,gradient) if implemented correctly. The implementation I have in my own code is independent of the kernel (acoustic, elastic, anisotropic) it is just a bit trickier to write so I went for simple. So in practice it can be hard coded within the generated code without the need for a librairy of damping kernels.

@mlange05 I haven't looked at your code, but could you not wrap the memory you allocate into a NumPy array the way PyOP2 does it and get the best of both worlds?

@kynan Yes you are right, in the base case that is exactly what we would do. Our code is currently still an almost completely self-contained C model, but this can be done with only little modifications. My comments were more aimed at later potential optimisations, where we might change the underlying data layout to enforce vectorisation across multiple arrays, at which point we require meta data to be associated with the arrays. Essentially I'm advocating for an Opesci-specific class equivalent to PyOP2.Dat.