marcotcr / anchor

Code for "High-Precision Model-Agnostic Explanations" paper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Anchor on tabular data.ipynb

mozo64 opened this issue · comments

commented

I cannot run the notebook, because don't know from where adult dataset to obtain.
I tried this dataset: https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data, but it does not work, so I assume it's sth another.
(Error: `TypeError Traceback (most recent call last)
in
2 # - https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data
3 dataset_folder = ''
----> 4 dataset = utils.load_dataset('adult', balance=True, dataset_folder=dataset_folder, discretize=True)

~/Notebooks/mozo/anchor/anchor/utils.py in load_dataset(dataset_name, balance, discretize, dataset_folder)
95 14: lambda x: map_array_values(x, label_map),
96 }
---> 97 dataset = load_csv_dataset(
98 os.path.join(dataset_folder, 'adult/adult.data'), -1, ', ',
99 feature_names=feature_names, features_to_use=features_to_use,

~/Notebooks/mozo/anchor/anchor/utils.py in load_csv_dataset(data, target_idx, delimiter, feature_names, categorical_features, features_to_use, feature_transformations, discretize, balance, fill_na, filter_fn, skip_first)
226 le = sklearn.preprocessing.LabelEncoder()
227 le.fit(labels)
--> 228 ret.labels = le.transform(labels)
229 labels = ret.labels
230 ret.class_names = list(le.classes_)

~/.local/lib/python3.8/site-packages/sklearn/preprocessing/_label.py in transform(self, y)
136 return np.array([])
137
--> 138 return encode(y, uniques=self.classes)
139
140 def inverse_transform(self, y):

~/.local/lib/python3.8/site-packages/sklearn/utils/_encode.py in _encode(values, uniques, check_unknown)
181 else:
182 if check_unknown:
--> 183 diff = _check_unknown(values, uniques)
184 if diff:
185 raise ValueError(f"y contains previously unseen labels: "

~/.local/lib/python3.8/site-packages/sklearn/utils/_encode.py in _check_unknown(values, known_values, return_mask)
253
254 # check for nans in the known_values
--> 255 if np.isnan(known_values).any():
256 diff_is_nan = np.isnan(diff)
257 if diff_is_nan.any():

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
`

Please find the data for this on the author's other repo - https://github.com/marcotcr/anchor-experiments/tree/master/datasets. That works well with the example notebooks.