Are the "efficient-kan" and "official-kan" equivalent in terms of algorithms?

Question

Are the "efficient-kan" and "official-kan" equivalent in terms of algorithms?

yuedajiong opened this issue 4 months ago · comments

as title

Mateusz Kwietniewski · Answer 1 · Sun May 12 2024 22:54:34 GMT+0800 (China Standard Time)

As I know almost the same, only official version looks to have additional bias after each layer. Also, I am not sure if initialization is the same. + regularization loss is changed because of optimizations.

yuedajiong · Answer 2 · Mon May 13 2024 10:10:42 GMT+0800 (China Standard Time)

@Indoxer Thanks, you are so kindly.

Vater Hu · Answer 3 · Mon May 13 2024 11:48:23 GMT+0800 (China Standard Time)

No, I'm not quite sure
I tried the official tutorial on the following link: Tutorial

*Including the use of the official LBFGS training strategy
The results showed that after completing all the one-time training, the model was almost identical to the official one
But if training is conducted in phases, it cannot be perfectly fitted(But the model is still effective, just slightly underperforming)

official KAN

Eff-KAN

Vater Hu · Answer 4 · Mon May 13 2024 11:50:30 GMT+0800 (China Standard Time)

I think this is acceptable, after all, the model is very efficient, and some losses are normal. It's strange if there are no losses at all. While it effectively retains the characteristics of the official model, it also combines training optimization

Mateusz Kwietniewski · Answer 5 · Mon May 13 2024 17:50:01 GMT+0800 (China Standard Time)

@WhatMelonGua, are you sure, that you didn't train spline_scaler and base_weights? Also did you have the same parameters in LBFGS optimizer (number of steps, etc.)?

Mateusz Kwietniewski · Answer 6 · Mon May 13 2024 18:20:47 GMT+0800 (China Standard Time)

(spline_scaler not trained, base_weights not trained)

(spline_scaler trained, base_weights trained):

(I am using my modified version (but the same algorithm as efficient kan), so I am not sure)

Vater Hu · Answer 7 · Mon May 13 2024 23:35:43 GMT+0800 (China Standard Time)

@WhatMelonGua, are you sure, that you didn't train spline_scaler and base_weights? Also did you have the same parameters in LBFGS optimizer (number of steps, etc.)?

Oh, yes, forgive me for forgetting
There are no such parameters, so for that reg_ variable (I don't know what it is), I simply took the default value of 1 and fixed many errors (perhaps I was fixing it blindly, just making it work)
And then the result was that the official "LBFGS" cannot be directly migrated here

Vater Hu · Answer 8 · Mon May 13 2024 23:37:35 GMT+0800 (China Standard Time)

(spline_scaler not trained, base_weights not trained) (spline_scaler trained, base_weights trained):

(I am using my modified version (but the same algorithm as efficient kan), so I am not sure)

This may seem like our operations are similar
What a coincidence! 🤗

Mateusz Kwietniewski · Answer 9 · Tue May 14 2024 00:13:52 GMT+0800 (China Standard Time)

@WhatMelonGua, are you sure, that you didn't train spline_scaler and base_weights? Also did you have the same parameters in LBFGS optimizer (number of steps, etc.)?

Oh, yes, forgive me for forgetting There are no such parameters, so for that reg_ variable (I don't know what it is), I simply took the default value of 1 and fixed many errors (perhaps I was fixing it blindly, just making it work) And then the result was that the official "LBFGS" cannot be directly migrated here

reg_ is regularization loss. loss = train_loss + lamb * reg_ for continual learning lamb=0.0 so loss = train_loss

Mateusz Kwietniewski · Answer 10 · Tue May 14 2024 00:55:32 GMT+0800 (China Standard Time)

Here are my results and code, so you can compare

Blealtan · Answer 11 · Sat May 18 2024 02:52:10 GMT+0800 (China Standard Time)

AFAIK the only difference is that the "efficient" regularization loss is different from the official one. But I'm not sure if the parallel associativity will introduce numerical error that's large enough to break some important features.

Blealtan · Answer 12 · Mon May 20 2024 20:25:06 GMT+0800 (China Standard Time)

Just found that I missed the bias term after each layer. Will update that soon.

I scanned over this long thread few days ago and totally missed the comment by @Indoxer lol