Feeding audio Waveform in every layer
SatyamKumarr opened this issue · comments
As per issue #83, it was discussed that input is provided as raw waveform in 1st layer and using single channel floating point tensor .
- Why can't we extend to multiple layer and may improve accuracy?
- What benefit is been provided by feeding raw audio waveform in 1st layer?
- Can this idea be extended to other waveform (Music, noisy speech data) instead of focusing on text to speech synthesis?