Kernel SVM implementation is wrong (?)

Question

Kernel SVM implementation is wrong (?)

tatnguyennguyen opened this issue 7 years ago · comments

Take a look at the first formula on page 103 of your book. This is the objective function for (batch) kernel SVM. Parameter is vector b and each b_i is dedicated for only one x_i and one y_i alone. So it does not make sense to use the same vector b (with the size equal to the batch size) for all mini-batches as you did in your code because x and y in each mini-batch are different.

Take a look at this http://cs229.stanford.edu/extra-notes/representer-function.pdf. I think the (naive) proper way to implement kernel SVM is to use parameter vector b with the size equal to the size of the whole training set, and in each epoch, update those b_i corresponding to training examples that are chosen at that epoch.

Why your code gives a good result? Because vector b is used for random examples across epoches, so the only reasonable value of b is the one in which all elements are equal (you can verify this by printing value of b after training with many more epoches) In prediction formula for kernel SVM, when all b_i are equal, the formula become a kind of k-Nearest Neighbors (k equal to batch size, and the weight is given by kernel function)

You can get good result without any training, just initialize vector b to all one (line 40)
b = tf.Variable(tf.ones(shape=[1,batch_size]))
and comment out the training step (line 89)
# sess.run(train_step, feed_dict={x_data: rand_x, y_target: rand_y})
Then, run the script, you will get accuracy is above 0.97 and the contour map looks decent

cas · Answer 1 · Fri Mar 16 2018 09:40:37 GMT+0800 (China Standard Time)

I also find out this problem.

Nick · Answer 2 · Thu Mar 22 2018 04:31:20 GMT+0800 (China Standard Time)

Hi @tatnguyennguyen , This is super interesting! Thanks for finding this. I'm just now triaging and going through the issues in preparation for a book/code v2.

When I get to the SVM (chapter 4), I will investigate this. I see your point and you are probably right and I think the fix will be to increase the batch size to the data size. Although I'll see if I can edit it for smaller batches first.

Nick · Answer 3 · Mon Apr 09 2018 09:26:43 GMT+0800 (China Standard Time)

Yes, I find the fix to be to make the batch_size equal to the size of the training dataset.

I think the long run fix would be, as you suggest, to find which indices are selected for the training and only update those in the b-matrix. But for now, making the batch_size equal to the dataset size is sufficient.