rafaelvalle / MDI

Missing Data Imputation Python Library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

unhashable type: 'numpy.ndarray' in substituteValues() call of mode

IrisOren opened this issue · comments

Hi,
I am trying to run impute.knn on the adult-train-raw data. When it calls substituteValues, it returns an 'unhashable type: numpy.ndarray' error on calling mode (error copied below).


TypeErrorTraceback (most recent call last)
in ()
43 # replace missing data with knn
44 print 'imputing with K-Nearest Neighbors'
---> 45 data_knn = imp.knn(x, n_neighbors, np.mean, missing_data_cond, cat_cols)
46
47 def compute_histogram(data, labels):

C:\Users\v1ioren\Local Documents\Python Scripts\missing_data_imputation.py in knn(self, x, k, summary_func, missing_data_cond, cat_cols, weighted, in_place)
204
205 print 'Substituting missing values'
--> 206 map(substituteValues, xrange(len(missing)))
207 return data
208

C:\Users\v1ioren\Local Documents\Python Scripts\missing_data_imputation.py in substituteValues(i)
194 cols = missing[row]
195 ######################################Raises errors: IO
--> 196 data[row, cols] = mode(data[mask][ids[i]][:, cols], axis=0)[0].flatten()
197 ###code below is IO edits as line above was generating error
198

C:\Users\v1ioren\AppData\Local\Continuum\anaconda3\envs\GUS2_7\lib\site-packages\scipy\stats\stats.pyc in mode(a, axis, nan_policy)
440 return mstats_basic.mode(a, axis)
441
--> 442 if (NumpyVersion(np.version) < '1.9.0') or (a.dtype == object and np.nan in set(a)):
443 # Fall back to a slower method since np.unique does not work with NaN
444 # or for older numpy which does not support return_counts

TypeError: unhashable type: 'numpy.ndarray


It does not seem to like calling mode across the columns of the np array.

I have had a go at editing the function (see below), and replaced the one line call to mode(), with a for loop across the columns with missing data for the row corresponding to missing[i] (I think :-).

It now does substitute but my output differs to what you show in the bar plot.

def substituteValues(i):
     row = missing.keys()[i]
     cols = missing[row]
     ######################################Original Raises errors: IO
     #data[row, cols] = mode(data[mask][ids[i]][:, cols], axis=0)[0].flatten() 
     ######################################          
     ###code below is IO edits
     temp=data[mask][ids[i]][:, cols]
     for mc in range(temp.shape[1]):
         data[row, cols]=mode(data[mask][ids[i]][:, cols][:,mc].flatten())[0]
   

Could you help me to debug?

I am using Python 2.7, scipy, 1.2.1, numpy 1.11.3

Many thanks
Iris

'