deep-learning-with-pytorch / dlwpt-code

Code for the book Deep Learning with PyTorch by Eli Stevens, Luca Antiga, and Thomas Viehmann.

Home Page:https://www.manning.com/books/deep-learning-with-pytorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Do sound and audio files have to be the same length?

waszee opened this issue · comments

I looked at the audio chirp exercise posted as an extra example in chapter 4. It and the discussion of making an array stack of sounds seems to imply that they should all have the same time length or handled using embedding. I am interested in visual patterns of Morse Code sounds and wrote some code exercises to make the patterns as numpy and tensor arrays, but I don't think they are going to work for deep learning input so need help., see waszee repo SignalStudies_RF if interested. The problem I am wrestling with is how to standardize the audio patterns to feed the learning exercise. For example the letter "e" is just a single dit long but the letter "b" is dah dit dit dit long and common pattrens include stings like "CQ CQ de ....". The numpy files I built have a variable length string of sound pulses that depend on the pulse pattern and the sending speed. The audio chirp example seems to say we can make the sounds same length, I wonder if anybody has suggestions as to how to handle the vairable lengths to make common array images for all of the patterns of sound needed to decode the sounds. I suspect the natural language guys have dealt with the issue as vocal words have different lengths but I did not see any references here or in the text as to where to look. Please offer suggestions as to how to handle the embedding? Still learning how to tag targets to the patterns too. I am very new to PyTorch.

An example of pattern of CQ ABC is shown as an image capture. Like the audio chirp each letter can be saved as a separate pattern but the same pattern will change as in length with sending speed and oscillation with the beat tones.

CQ_ABC_pattern

I should say that the real pattern is more complex as shown in this clip of real code using a spectrogram created with audacity
image

The typical thing to do is padding. For this, you'd want to group items with similar length.