Refefer / fastxml

FastXML / PFastXML / PFastreXML - Implementation of Extreme Multi-label Classification

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TypeError: Object of type 'int64' is not JSON serializable during trainer.save('bah')

ljmartin opened this issue · comments

Hi,
Looking forward to using FastXML. This is not quite a bug, but it might be worth handling? Just thought I'd report it in case anyone else comes across it. JSON doesn't take numpy data types, so Y has to be changed to int when converting from numpy labels.
This is my setup:

from fastxml import Trainer, Inferencer
from sklearn.datasets import make_multilabel_classification

X, Y = make_multilabel_classification(n_classes=10, n_labels=1,
                                      allow_unlabeled=True,
                                      random_state=1)

X = [X[i].astype('float32') for i in range(X.shape[0])]
X_sparse = [csr_matrix(b) for b in X]

##This line will lead to trainer.save('bah') failing
Y_list = [list(np.where(i==1)[0]) for i in Y]

##This line converts the values to ints, and then trainer.save('bah') will work down the line
Y_list = [[int(k) for k in list(np.where(i==1)[0])] for i in Y]

trainer = Trainer(n_trees=10, n_jobs=1)
trainer.fit(X_sparse, Y_list)

trainer.save('bah')

Thanks for the find; good ol' Numpy.. I'll make a note in the README that labels need to be JSON serializable.