Weird outputs in interp_out

Question

Weird outputs in interp_out

gillette7 opened this issue 3 years ago · comments

I'm running some examples where the input data points form a regular lattice in 2D and the points for interpolation sometimes lie on the edges or diagonals of this grid. I've found that the output values from interp_out in this case are sometimes things like 1e271 or nan. (I'm using the python bindings). Is this a known issue, perhaps due to the non-uniqueness of the Delaunay triangulation on a lattice? Or is this (more likely?) some other bug in my code? Thanks.

Thomas Lux · Answer 1 · Fri Mar 26 2021 03:03:39 GMT+0800 (China Standard Time)

Grid aligned data is not in "general position", so the nonuniqueness of the Delaunay triangulation can cause all sorts of problems. I'd suggest adding an acceptable random perturbation to all of your input data (on the scale of 10^{-8}).

If the problem persists, then it can be an issue with the compilation of the package when it uses the local quadratic program solver for extrapolation (I believe this can be used on the edge of the convex hull as well as outside). In that case I'd make sure that your Python bindings are being compiled with "-lblas -llapack" and not the local files.

See the comments in the Python wrapper here.

Tyler H Chang · Answer 2 · Fri Mar 26 2021 03:09:55 GMT+0800 (China Standard Time)

The outputs for interp_out should never contain nan values or anything outside the range of observed data... If the triangulation is non-unique, you should still get the interpolated result from a Delaunay simplex from some Delaunay triangulation... There should be no need to ever perturb your data...

Tyler H Chang · Answer 3 · Fri Mar 26 2021 03:11:32 GMT+0800 (China Standard Time)

@gillette7 can you share such a dataset with us so I can try to check and see what is causing the failure?

Andrew Gillette · Answer 4 · Fri Mar 26 2021 05:25:55 GMT+0800 (China Standard Time)

Hmm... in preparing a dataset for you I may have discovered the problem. It looks like the interp_in array that I'm providing as input to DelaunaySparse has unallowable(?) numbers, like 2.3e-310 as well as nans. I thought that was avoided by this line, which I put in before printing interp_in :

interp_in = np.require(interp_in, dtype=np.float64, requirements=['F'])

Any thoughts?

Andrew Gillette · Answer 5 · Fri Mar 26 2021 05:59:12 GMT+0800 (China Standard Time)

This is definitely the problem. When I generate the data, put it into a pandas dataframe and print it, I get this:

data_train outputs = 
          0
0  0.000000
1  0.000000
2  0.000000
3  0.000000
4 -0.263418
5  0.000000
6  0.000000
7  0.000000
8  0.000000

Looks fine. This pandas dataframe object is returned by one python function, then passed as input to another. By the time it gets there, it prints like this:

data_train outputs = 
               0
0  2.052198e-316
1  2.052161e-316
2   0.000000e+00
3   0.000000e+00
4  -2.634177e-01
5   0.000000e+00
6   0.000000e+00
7   0.000000e+00
8  1.264808e-321

This is before I try to enforce type, or do anything else. So it's seemingly some quirk of moving data around in python...

Tyler H Chang · Answer 6 · Fri Mar 26 2021 06:17:29 GMT+0800 (China Standard Time)

Aha, this looks like the cause of the issue... not sure what is going on without seeing your code, but these seem like corrupted or uninitialized entries

Andrew Gillette · Answer 7 · Fri Mar 26 2021 12:59:41 GMT+0800 (China Standard Time)

The issue was in the conversion from numpy to pandas:

data_train_outputs = pd.DataFrame(data_train_outputs)

Some of the data was not being read into pandas as floats, for reasons unknown (probably a quirk of how the data was generated, using a package called pymfem). However, I was able to fix it by adding this line:

data_train_outputs = data_train_outputs.astype(float)

Thanks for your help!