unhashable type: 'numpy.ndarray' in substituteValues() call of mode
IrisOren opened this issue · comments
Hi,
I am trying to run impute.knn on the adult-train-raw data. When it calls substituteValues, it returns an 'unhashable type: numpy.ndarray' error on calling mode (error copied below).
TypeErrorTraceback (most recent call last)
in ()
43 # replace missing data with knn
44 print 'imputing with K-Nearest Neighbors'
---> 45 data_knn = imp.knn(x, n_neighbors, np.mean, missing_data_cond, cat_cols)
46
47 def compute_histogram(data, labels):
C:\Users\v1ioren\Local Documents\Python Scripts\missing_data_imputation.py in knn(self, x, k, summary_func, missing_data_cond, cat_cols, weighted, in_place)
204
205 print 'Substituting missing values'
--> 206 map(substituteValues, xrange(len(missing)))
207 return data
208
C:\Users\v1ioren\Local Documents\Python Scripts\missing_data_imputation.py in substituteValues(i)
194 cols = missing[row]
195 ######################################Raises errors: IO
--> 196 data[row, cols] = mode(data[mask][ids[i]][:, cols], axis=0)[0].flatten()
197 ###code below is IO edits as line above was generating error
198
C:\Users\v1ioren\AppData\Local\Continuum\anaconda3\envs\GUS2_7\lib\site-packages\scipy\stats\stats.pyc in mode(a, axis, nan_policy)
440 return mstats_basic.mode(a, axis)
441
--> 442 if (NumpyVersion(np.version) < '1.9.0') or (a.dtype == object and np.nan in set(a)):
443 # Fall back to a slower method since np.unique does not work with NaN
444 # or for older numpy which does not support return_counts
TypeError: unhashable type: 'numpy.ndarray
It does not seem to like calling mode across the columns of the np array.
I have had a go at editing the function (see below), and replaced the one line call to mode(), with a for loop across the columns with missing data for the row corresponding to missing[i] (I think :-).
It now does substitute but my output differs to what you show in the bar plot.
def substituteValues(i):
row = missing.keys()[i]
cols = missing[row]
######################################Original Raises errors: IO
#data[row, cols] = mode(data[mask][ids[i]][:, cols], axis=0)[0].flatten()
######################################
###code below is IO edits
temp=data[mask][ids[i]][:, cols]
for mc in range(temp.shape[1]):
data[row, cols]=mode(data[mask][ids[i]][:, cols][:,mc].flatten())[0]
Could you help me to debug?
I am using Python 2.7, scipy, 1.2.1, numpy 1.11.3
Many thanks
Iris
'