Bug in Non-Linear SVM Kernel Calculation
fbcotter opened this issue · comments
I think there's a bug in the calculation of the Gaussian Kernel for the multiclass SVM. Your code is as follows:
# Gaussian (RBF) kernel
gamma = tf.constant(-10.0)
dist = tf.reduce_sum(tf.square(x_data), 1)
dist = tf.reshape(dist, [-1,1])
sq_dists = tf.multiply(2., tf.matmul(x_data, tf.transpose(x_data)))
my_kernel = tf.exp(tf.multiply(gamma, tf.abs(sq_dists)))
Initially it bothered me that the variable dist
was calculated but never used. I then tried to figure out what you were doing in this code, and I think it is supposed to be doing something like this (dist is K in the below example)
In which case, the code should read something more like:
# Gaussian (RBF) kernel
gamma = tf.constant(-10.0)
K = tf.reduce_sum(tf.square(x_data), 1)
K_cols = tf.expand_dims(K, 1)
K_rows = tf.expand_dims(K, 0)
D = K_cols + K_rows - 2*tf.matmul(x_data, tf.transpose(x_data))
my_kernel = tf.exp(tf.multiply(gamma, D))
(The sq_dists calculation uses broadcasting so we don't need to create the J
matrix as in the math above)
HI @fbcotter , Thanks for this! I'm just getting around to triaging issues in preparation for a v2-update of the code.
I suspect you are correct. When I get around to chapter 4, I'll investigate and most likely accept your proposal. Expect more in the next month or so. Thanks again!
Yes, thanks. I just looked into that and found you are correct. I've implemented a very similar fix and kept the 'dist' variable terminology in both the *.py and the jupyter notebook. Thanks again!