SekiroRong / KAN-AutoEncoder

KAE : KAN-based AutoEncoder (AE, VAE, VQ-VAE, RVQ, etc.)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

KAE:KAN-AutoEncoder

This repo is heavily based on Blealtan's implement of KAN. The original implementation of KAN is available here.

Motivation

Intuitively, KAN seems a natural good representation of signals, especialy of audio signals which can be decomposed into sinusoidal signals.

Thus, this repo is created to investigate the potential of KAN to represent the sinusoidal signals and even more complicated signals.

How to use

Here I create two Juypter notebooks, one for KAN-based AutoEncoder and another for MLP-based AutoEncoder.

My toy example shows that KAN is way better than MLP in representing sinusoidal signals, which may indicate the great potential of KAN to be the new baseline of AutoEncoder.

List of supported KAN-based Autoencoders

Results & Interesting Findings

Firstly, KAE is able to compress a 128-dimension unseen sinusoidal signal into 5 dimensions and reconstruct the signal back to 128 dimensions approximately lossless:

recon_signal.jpg

Another interesting finding is KAE can be utilised as a mixer for two different signals:

mix_signal.jpg

Then I try to scale up the experiments, using real music sequence instead of toy dimensional signals, the dataset I use can be found here.

It turns out that, KAN is able to reconstruct real, noisy, comlicate music sequence with few parameters:

recon_music.jpg

And still can be utilised as a mixer for different audio signals.

mix_music.jpg

Here is a KAN-based VAE model, a KAN-based VQ-VAE model and a * KAN-based RVQ model.

RVQ model seems converge way slower than vanilla VQ-VAE. Maybe the average strategy to update the codebook leads to this slow converge?

About

KAE : KAN-based AutoEncoder (AE, VAE, VQ-VAE, RVQ, etc.)

License:MIT License


Languages

Language:Jupyter Notebook 88.7%Language:Python 11.3%