mikesol / fouriax

A port of Christian Steinmetz's auraloss for jax.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

fouriax

PyPI PyPI - Python Version PyPI - License Coookiecutter - Wolt


Documentation: https://mikesol.github.io/fouriax

Source Code: https://github.com/mikesol/fouriax

PyPI: https://pypi.org/project/fouriax/


A jax port of auraloss.

Installation

pip install fouriax

Usage

import jax
import fouriax.stft as stft
from jax.nn.initializers import lecun_normal

key = jax.random.PRNGKey(0)
key1, key2 = jax.random.split(key)
shape = (4, 4098, 1)

# Initialize the tensor using LeCun normal distribution
input = lecun_normal()(key1, shape)
target = lecun_normal()(key2, shape)
fft_sizes = [1024, 2048, 512]
hop_sizes = [120, 240, 50]
win_lengths = [600, 1200, 240]
params = [
    stft.init_stft_params(x, y, z)
    for x, y, z in zip(fft_sizes, hop_sizes, win_lengths)
]
loss = multi_resolution_stft_loss(params, input, target)

Loss functions

We categorize the loss functions as either time-domain or frequency-domain approaches. Additionally, we include perceptual transforms.

Loss function Interface Reference
Time domain
Error-to-signal ratio (ESR) fouriax.time.esr_loss() Wright & Välimäki, 2019
DC error (DC) auraloss.time.DCLoss() Wright & Välimäki, 2019
Log hyperbolic cosine (Log-cosh) fouriax.time.log_cosh_loss() Chen et al., 2019
Frequency domain
Aggregate STFT fouriax.freq.stft_loss() Arik et al., 2018
Multi-resolution STFT fouriax.freq.multi_resolution_stft_loss() Yamamoto et al., 2019*
Perceptual transforms
FIR pre-emphasis filters fouriax.perceptual.fir_filter() Wright & Välimäki, 2019

* Wang et al., 2019 also propose a multi-resolution spectral loss (that Engel et al., 2020 follow), but they do not include both the log magnitude (L1 distance) and spectral convergence terms, introduced in Arik et al., 2018, and then extended for the multi-resolution case in Yamamoto et al., 2019.

PVC

A partial port of core routines in Paul Koonce's PVC can be found in pvc.py. This includes a novel FFT algorithm called fkt (Fast Koonce Transform) that, combined with convert, produces amplitude/frequency pairs for a given signal. This is often more attractive to use in loss functions than a garden-variety FFT because it provides better frequency information.

There is also a noscbank method that allows for resynthesis. This can be used as a simple recurrent layer at the end of a network to do waveform synthesis.

Development

  • Clone this repository
  • Requirements:
  • Create a virtual environment and install the dependencies
poetry install
  • Activate the virtual environment
poetry shell

Testing

pytest

Documentation

The documentation is automatically generated from the content of the docs directory and from the docstrings of the public signatures of the source code. The documentation is updated and published as a Github project page automatically as part each release.

Releasing

Trigger the Draft release workflow (press Run workflow). This will update the changelog & version and create a GitHub release which is in Draft state.

Find the draft release from the GitHub releases and publish it. When a release is published, it'll trigger release workflow which creates PyPI release and deploys updated documentation.

Pre-commit

Pre-commit hooks run all the auto-formatters (e.g. black, isort), linters (e.g. mypy, flake8), and other quality checks to make sure the changeset is in good shape before a commit/push happens.

You can install the hooks with (runs for each commit):

pre-commit install

Or if you want them to run only for each push:

pre-commit install -t pre-push

Or if you want e.g. want to run all checks manually for all files:

pre-commit run --all-files

This project was generated using the wolt-python-package-cookiecutter template.

About

A port of Christian Steinmetz's auraloss for jax.

License:MIT License


Languages

Language:Python 100.0%