ae audio audio-processing autoencoder deep-learning kan pytorch pytorch-vae rvq torchaudio vae vq-vae

KAE:KAN-AutoEncoder

This repo is heavily based on Blealtan's implement of KAN. The original implementation of KAN is available here.

Motivation

Intuitively, KAN seems a natural good representation of signals, especialy of audio signals which can be decomposed into sinusoidal signals.

Thus, this repo is created to investigate the potential of KAN to represent the sinusoidal signals and even more complicated signals.

How to use

Here I create two Juypter notebooks, one for KAN-based AutoEncoder and another for MLP-based AutoEncoder.

My toy example shows that KAN is way better than MLP in representing sinusoidal signals, which may indicate the great potential of KAN to be the new baseline of AutoEncoder.

List of supported KAN-based Autoencoders

Results & Interesting Findings

Firstly, KAE is able to compress a 128-dimension unseen sinusoidal signal into 5 dimensions and reconstruct the signal back to 128 dimensions approximately lossless:

Another interesting finding is KAE can be utilised as a mixer for two different signals:

Then I try to scale up the experiments, using real music sequence instead of toy dimensional signals, the dataset I use can be found here.

It turns out that, KAN is able to reconstruct real, noisy, comlicate music sequence with few parameters:

And still can be utilised as a mixer for different audio signals.

Here is a KAN-based VAE model, a KAN-based VQ-VAE model and a * KAN-based RVQ model.

RVQ model seems converge way slower than vanilla VQ-VAE. Maybe the average strategy to update the codebook leads to this slow converge?

About

KAE : KAN-based AutoEncoder (AE, VAE, VQ-VAE, RVQ, etc.)

ae audio audio-processing autoencoder deep-learning kan pytorch pytorch-vae rvq torchaudio vae vq-vae

MIT License

Languages

Language:Jupyter Notebook 88.7%Language:Python 11.3%