Do sound and audio files have to be the same length?
waszee opened this issue · comments
I looked at the audio chirp exercise posted as an extra example in chapter 4. It and the discussion of making an array stack of sounds seems to imply that they should all have the same time length or handled using embedding. I am interested in visual patterns of Morse Code sounds and wrote some code exercises to make the patterns as numpy and tensor arrays, but I don't think they are going to work for deep learning input so need help., see waszee repo SignalStudies_RF if interested. The problem I am wrestling with is how to standardize the audio patterns to feed the learning exercise. For example the letter "e" is just a single dit long but the letter "b" is dah dit dit dit long and common pattrens include stings like "CQ CQ de ....". The numpy files I built have a variable length string of sound pulses that depend on the pulse pattern and the sending speed. The audio chirp example seems to say we can make the sounds same length, I wonder if anybody has suggestions as to how to handle the vairable lengths to make common array images for all of the patterns of sound needed to decode the sounds. I suspect the natural language guys have dealt with the issue as vocal words have different lengths but I did not see any references here or in the text as to where to look. Please offer suggestions as to how to handle the embedding? Still learning how to tag targets to the patterns too. I am very new to PyTorch.
The typical thing to do is padding. For this, you'd want to group items with similar length.