Blealtan / efficient-kan

An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Are the "efficient-kan" and "official-kan" equivalent in terms of algorithms?

yuedajiong opened this issue · comments

as title

As I know almost the same, only official version looks to have additional bias after each layer. Also, I am not sure if initialization is the same. + regularization loss is changed because of optimizations.

@Indoxer Thanks, you are so kindly.

No, I'm not quite sure
I tried the official tutorial on the following link: Tutorial

*Including the use of the official LBFGS training strategy
The results showed that after completing all the one-time training, the model was almost identical to the official one
But if training is conducted in phases, it cannot be perfectly fitted(But the model is still effective, just slightly underperforming)
image
official KAN
image
Eff-KAN

I think this is acceptable, after all, the model is very efficient, and some losses are normal. It's strange if there are no losses at all. While it effectively retains the characteristics of the official model, it also combines training optimization

@WhatMelonGua, are you sure, that you didn't train spline_scaler and base_weights? Also did you have the same parameters in LBFGS optimizer (number of steps, etc.)?

(spline_scaler not trained, base_weights not trained)
spline_scaler not trained, base_weights not trained
(spline_scaler trained, base_weights trained):
spline_scaler trained, base_weights trained

(I am using my modified version (but the same algorithm as efficient kan), so I am not sure)

@WhatMelonGua, are you sure, that you didn't train spline_scaler and base_weights? Also did you have the same parameters in LBFGS optimizer (number of steps, etc.)?

Oh, yes, forgive me for forgetting
There are no such parameters, so for that reg_ variable (I don't know what it is), I simply took the default value of 1 and fixed many errors (perhaps I was fixing it blindly, just making it work)
And then the result was that the official "LBFGS" cannot be directly migrated here

(spline_scaler not trained, base_weights not trained) spline_scaler not trained, base_weights not trained (spline_scaler trained, base_weights trained): spline_scaler trained, base_weights trained

(I am using my modified version (but the same algorithm as efficient kan), so I am not sure)

This may seem like our operations are similar
What a coincidence! 🤗

@WhatMelonGua, are you sure, that you didn't train spline_scaler and base_weights? Also did you have the same parameters in LBFGS optimizer (number of steps, etc.)?

Oh, yes, forgive me for forgetting There are no such parameters, so for that reg_ variable (I don't know what it is), I simply took the default value of 1 and fixed many errors (perhaps I was fixing it blindly, just making it work) And then the result was that the official "LBFGS" cannot be directly migrated here

reg_ is regularization loss. loss = train_loss + lamb * reg_ for continual learning lamb=0.0 so loss = train_loss

Here are my results and code, so you can compare

AFAIK the only difference is that the "efficient" regularization loss is different from the official one. But I'm not sure if the parallel associativity will introduce numerical error that's large enough to break some important features.

Just found that I missed the bias term after each layer. Will update that soon.

I scanned over this long thread few days ago and totally missed the comment by @Indoxer lol