'ValueError: kth out of bounds' in MGWR fitting
DKchenliyuan opened this issue · comments
Hi!
I try to fit a MGWR model based on the data (post-process .shp file in attachment).
point_trans.zip
My code is as follow:
soil = gp.read_file('point_trans.shp')
s_y = soil['TNPC'].values.reshape((-1,1))
s_x = soil[['SOCgkg', 'ClayPC', 'SiltPC', 'NO3Ngkg', 'NH4Ngkg']].values
s_u = soil['X']
s_v = soil['Y']
s_coords = list(zip(s_u, s_v))
s_x = (s_x - s_x.mean(axis = 0)) / s_x.std(axis = 0)
s_y = (s_y - s_y.mean(axis = 0)) / s_y.std(axis = 0)
mgwr_selector_soil = Sel_BW(s_coords, s_y, s_x, multi = True, fixed = True)
mgwr_bw_soil = mgwr_selector_soil.search()
print(mgwr_bw_soil)
mgwr_results_soil = MGWR(s_coords, s_y, s_x, mgwr_selector_soil).fit()
First, I notice that the bandwidths of 'ClayPC' and 'NH4Ngkg' (both 7484.08m) exceed the maximum distance (3742.06m) between observation points. These two bandwidths were both 3741.7m in the existing study (see Analyst A in Comber et al. 2020). I don't know if this bias is caused by my mistake. It seems that it happens when the bandwidth tends to be global.
Besides, I get the following error message when the model fitting:
ValueError Traceback (most recent call last)
<ipython-input-44-7199e7274476> in <module>
----> 1 mgwr_results_soil = MGWR(s_coords, s_y, s_x, mgwr_selector_soil).fit()
C:\Anaconda\envs\basemap\lib\site-packages\mgwr\gwr.py in fit(self, n_chunks, pool)
1593 tqdm(range(self.n_chunks), desc='Inference'))
1594
-> 1595 rslt_list = list(zip(*rslt))
1596 ENP_j = np.sum(np.array(rslt_list[0]), axis=0)
1597 CCT = np.sum(np.array(rslt_list[1]), axis=0)
C:\Anaconda\envs\basemap\lib\site-packages\mgwr\gwr.py in _chunk_compute_R(self, chunk_id)
1520
1521 for i in range(n):
-> 1522 wi = self._build_wi(i, self.bw_init).reshape(-1, 1)
1523 xT = (self.X * wi).T
1524 P = np.linalg.solve(xT.dot(self.X), xT).dot(init_pR).T
C:\Anaconda\envs\basemap\lib\site-packages\mgwr\gwr.py in _build_wi(self, i, bw)
236 wi = Kernel(i, self.coords, bw, fixed=self.fixed,
237 function=self.kernel, points=self.points,
--> 238 spherical=self.spherical).kernel
239 except BaseException:
240 raise # TypeError('Unsupported kernel function ', kernel)
C:\Anaconda\envs\basemap\lib\site-packages\mgwr\kernels.py in __init__(self, i, data, bw, fixed, function, eps, ids, points, spherical)
56 self.bandwidth = np.partition(
57 self.dvec,
---> 58 int(bw) - 1)[int(bw) - 1] * eps #partial sort in O(n) Time
59
60 self.kernel = self._kernel_funcs(self.dvec / self.bandwidth)
<__array_function__ internals> in partition(*args, **kwargs)
C:\Anaconda\envs\basemap\lib\site-packages\numpy\core\fromnumeric.py in partition(a, kth, axis, kind, order)
744 else:
745 a = asanyarray(a).copy(order="K")
--> 746 a.partition(kth, axis=axis, kind=kind, order=order)
747 return a
748
ValueError: kth(=972) out of bounds (689)
There are 689 observation points in the data, but what does 'kth(=972)' mean? How can I deal with this problem?
Any comments on this greatly appreciated.
Thanks for reporting the issue. What I see here is that you are using a fixed bi-square kernel. We had some internal discussions previously and found that the combinations of fixed-bisquare and adaptive-gaussian sometimes have some fitting issues. So we would suggest to use either adaptive-bisquare or fixed-gaussian kernel. I think the reason is that a fixed-bisquare kernel may contain too few points (plus the weights drop to zero at bandwidth) at some locations when the spatial distribution of the points are very irregular. If you changed to a gaussian kernel, it would run ok. Please see the attached screenshot.
Regarding the bandwidth exceeds the largest pair-wise distance, this is possible and expected when the relationship approaches global. The upper bound of the bandwidth search range in mgwr
and GWmodel
may be different. Here we use the twice of the maximum pairwise distance as the upper limit. See this line of code here. That may explain why the bandwidth is mgwr is almost twice of the one reported in the Comber paper (I guess Lex was using GWmodel)?
@Ziqi-Li
Thanks for your comment. It helps me a lot.
An additional question:
I find mgwr
seems to have no functions to directly or conveniently fit the Mixed GWR model (such as gwr.mixed
or gwr.multiscale(...bws0=c(Inf, 100, 100, Inf)...)
in GWmodel
). Is that right?
@DKchenliyuan Yes, the bandwidth cannot be forced to be inf yet. Please see this issue for some discussions #4. It is not something hard to do, so we could probably add this in somewhere in the summer.
I'm closing the issue, if anything else pops up, feel free to reopen it.