kavorite / s4d

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This repository contains a minimal haiku implementation of Sequence Structured State Space models (S4), specifically a variant that uses a diagonal state matrix S4D, which "recovers S4's performance on most tasks." These modules are useful for sequence transduction, competitive in language-modeling tasks when replacing self-attention in an architecture such as an encoder-only transformer stack, but their novel parametrization makes them more compact compared to attention-based architectures, as the space and compute required scale linearly, rather than quadratically, with respect to the sequence length they expect to encounter. This opens up entirely new and previously-untapped application domains, for which applying transformer architecture design was previously intractable— such as end-to-end conditional music generation on raw audio samples.

These models take a principled approach to long-range dependencies, and are theoretically capable of representing such dependencies over arbitary sequence lengths. Empirically they perform well for sequence lengths of up to several thousand steps on a variety of discrete timescales. They also perform autoregressive inference much faster because rather than requiring a forward pass over an entire sequence, they use a scratchpad, which makes them much cheaper at inference-time for generative tasks. See s4dbert.py for a worked example of how to instantiate a simple S4 transformer using the API.

i came this 🤌 close to renaming the repository decepticon while proofreading the README because of how hard it dunks on baseline transformers. i have to be reconciled to that information and now so do you. thank you warren for the suggestion.

About

License:MIT License


Languages

Language:Python 100.0%