How to convert fbank features back to audio ?

Question

How to convert fbank features back to audio ?

linmou opened this issue 2 years ago · comments

Given that the fbank feature reconstructed by ssast is not so straight forward, how to transform it into pure audio data for further analysis ?

Yuan Gong · Answer 1 · Thu Aug 18 2022 01:02:33 GMT+0800 (China Standard Time)

Hi there,

The goal of reconstruction loss here is just to force the model to learn a good audio representation. We didn't mean to make the model a strong reconstructor. But if you want to convert spectrogram back to waveforms, you will need a vocoder (not included in this repo).

-Yuan

linmou · Answer 2 · Fri Aug 19 2022 15:57:17 GMT+0800 (China Standard Time)

Thanks for your warmly reply.
Any vocoder recommend? I want to inverse fbank features to audios.

Yuan Gong · Answer 3 · Sat Aug 20 2022 05:15:12 GMT+0800 (China Standard Time)

Hi there,

I am not familiar with vocoder - you can check the github list: https://github.com/topics/vocoder. Note most of these are for TTS (speech) rather than general audio.

-Yuan