rasbt / python-machine-learning-book

The "Python Machine Learning (1st edition)" book code repository and info resource

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kernal PCA [projecting new data points]

VedAustin opened this issue · comments

So you have this:

X, y = make_moons(n_samples=100, random_state=123)
alphas, lambdas =rbf_kernel_pca(X, gamma=15, n_components=1)

Then you take a sample from X:
x_new = X[25]

And then find the projection for the new sample from:

x_reproj = project_x(x_new, X, 
...       gamma=15, alphas=alphas, lambdas=lambdas)

But x_new was already a part of alphas and lambdas created using X. In other words, X already had x_new when the rbf_kernel_pca was applied. So should I be surprised that the projected value of x_new coincides exactly in the plots? I would have thought it might have been better to exclude x_new to derive alpha and lambda values and then apply project_x. Thoughts?

So the idea was basically to implement the equation for computing the projection of new points. E.g., think of deriving the kernel PCA parameters from a training set (in a classification pipeline), and then applying them to an independent test set (or any future data). Sure, I could have used a point that was not in the training dataset.

So should I be surprised that the projected value of x_new coincides exactly in the plots?

My intention was to use one of the training points to have a positive control for checking whether or not we implemented the equation correctly. I.e., if there was a bug in the computation. Hope that makes sense; what I mean is that if I'd chosen some new point, e.g., [100, 100], I couldn't say whether the reprojection is right or wrong because I wouldn't have a reference point.

unknown

Great! Thanks for the reply and lucid explanation. All makes sense - this could be more of an exercise for myself - but could it be possible to create 2 PCA models one with the test point included and one without and then compare the projections?
PS: Love your style of writing code - learned a lot from this book - am going to keep your book for all future reference (especially copying code for plotting - my major weakness)

Great! Thanks for the reply and lucid explanation. All makes sense - this could be more of an exercise for myself - but could it be possible to create 2 PCA models one with the test point included and one without and then compare the projections?

You are very welcome! Yeah, I would only expect a tiny difference if it's only one point -- still an interesting exercise :).

And thanks a lot for the nice words, I am really glad to hear that the book turns out to be so useful to you! It's really nice and motivating to hear that :)