SVC doesn't work with sklearn OneVsRestClassifier
hblab-anhnt opened this issue · comments
I need to process a big training data with OneVsRestClassifier(SVC)
model. Due to training data size, i need GPU support, so i moved from sklearn
to thundersvm
. But after replacing, its result become worse. How can i fix it? Please check below code for reproduction bugs:
# get thundersvm test data
!git clone https://github.com/Xtra-Computing/thundersvm.git
from thundersvm import SVC
from sklearn.datasets import *
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC as oldSVC
#loading data from thunder svm test data
x,y = load_svmlight_file("../dataset/test_dataset.txt")
x2,y2=load_svmlight_file("../dataset/test_dataset.txt")
# sklearn OneVsRestClassifier(SVM) -> score 0.98. It works well
clf = OneVsRestClassifier(oldSVC(verbose=True, gamma=0.5, C=100))
clf.fit(x,y)
y_predict=clf.predict(x2)
score=clf.score(x2,y2)
print(score)
# thundersvm OneVsRestClassifier(SVM) -> score 0.02. It becomes worse
clf = OneVsRestClassifier(SVC(verbose=True, gamma=0.5, C=100))
clf.fit(x,y)
y_predict=clf.predict(x2)
score=clf.score(x2,y2)
print(score)
Please help me to fix it. Our training data is huge, so without GPU supporting, it is infeasible for creating model
can anyone help me ?
Hi @hblab-anhnt, can you provide some data that helps us reproduce your results?
@hblab-anhnt ThunderSVM only supports one-vs-one for classification which often produces competitive results to one-vs-rest. Would you try one-vs-one? I will mark this issue as enhancement, so that we can work on it in the future upgrade.
@zeyiwen So I should use OneVsRestClassifier(SVM(decision_function_shape='ovo'))
or OneVsOneClassifier(SVC())
?
However, i prefer one-vs-rest than one-vs-one due to complexity and prediction speed .With n classes for multi classification, one-vs-rest create n models
, but one-vs-one creates n(n-1)/2 models
, which means increasing complexity and training/prediction time
@zeyiwen @Kurt-Liuhf
Sorry i forget adding loading data code. I use test data from thunder svm
!git clone https://github.com/Xtra-Computing/thundersvm.git
from sklearn.datasets import *
x,y = load_svmlight_file("../dataset/test_dataset.txt")
x2,y2=load_svmlight_file("../dataset/test_dataset.txt")