This repository contains the code for our ASR project on self-supervised representation learning for raw audio.
We hypothesize that problem-agnostic features learnt from raw audio can be beneficial for downstream tasks if they capture two important aspects of the underlying audio signal - context and order. We suggest two relatively simple tasks for enforcing these constraints on the learnt features by borrowing some ideas from prior work in other domains (images, videos, text).