sandialabs / compadre

Compadre (Compatible Particle Discretization and Remap)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Separate functors from GMLS class

kuberry opened this issue · comments

Right now, the GMLS class has struct tags for all of the various functors. Each of these captures the whole class, and each creates its own TeamPolicy.

Things needed (Work in progress):

  • Add a PointConnections structs that contains all getTarget... or getNeighbor... type functions.

  • Move the neighbor list accessor object as a member of PointConnections and another for the additional evaluation sites

  • Move all getTargetOffset... and getAlpha... type functions to a SolutionSet struct

  • Extract all basis related functions from GMLS class and let them accept a GMLSBasisData struct (far fewer variables needed than GMLS class)

  • Extract all alpha related function from GMLS class and let them accept a GMLSSolutionData struct (far fewer variables needed than GMLS class)

Each functor will be separated from GMLS class and a common TeamPolicy will be used as often as possible.

Many sizing variables (_dimensions, _lro_..., etc...) are required to separate solutions functionality getTargetOffset... and getAlpha..., etc...

Aside from the actual alphas, almost all accessor functions are just a combination of sizing variables.

Organize as:

GMLS 
- StateHostFiltered (all sizing variables on the host)
- SolutionSet (all solution variables / access methods here)
    - StateDeviceFiltered (subset of all sizing variables needed on device)

when GMLS is queried for solution, just exposes SolutionSet, which contains StateDeviceFiltered, which is copied to the device/host based on a template tag. How to keep SizingNarrow always up to date with SizingAll? When GMLS modifies outside of a kernel, it modified StateHostFiltered. Can be connected to the clearCoefficients strategy, with a copy made of all variables to device that are needed when generatePolynomialCoefficients is called.

SolutionSet can be extracted from GMLS problem after solve, and persists (copied by value) independent of the GMLS class fate. Templating the SolutionSet and StateDeviceFiltered allows for user to select whether to retrieve solution on host or device. No copies of _alphas to _host_alphas at this point (most expensive mem copy).

Could look into asynchronous copy of parts of _alphas to _host_alphas. Certainly could be done by batch.

Template SolutionSet's getAlphas call to use the _alphas if on device, and _host_alphas if on host.

Ended up leaving sizing variables in the GMLS class (_dimensions, etc...).
Moved all _lro related functions to SolutionSet which is templated for host or device with a copy constructor that copies all _lro... variables from device/host to device/host.

_alphas are omitted from the copy, but a copyAlphas() function was added.

_h_ss was made private in the GMLS class with a getSolutionSetHost() call to get _h_ss that will ensure _alphas are copied to the host (otherwise they won't be copied, which saves a lot of time when using the GPU). The Evaluator class is friended to GMLS so it can get variables from h_ss without causing a copy to the host.

GMLS 
- SolutionSet

Comparison of nvprof ./examples/GMLS_Device_Test --p 3 --d 3 --nt 100000
On GPU, 50% faster for generation of alphas by applying target evaluations to polynomial coefficients.
Copy that was saved was about 60% of ApplyTarget time (so 0.286s combined compared with 0.0787s).

Applying alphas to data went from to 8.24e-02 to 6.34e-02, so also a reduction of about 25%.