Do sound and audio files have to be the same length?

Question

Do sound and audio files have to be the same length?

waszee opened this issue 4 years ago · comments

I looked at the audio chirp exercise posted as an extra example in chapter 4. It and the discussion of making an array stack of sounds seems to imply that they should all have the same time length or handled using embedding. I am interested in visual patterns of Morse Code sounds and wrote some code exercises to make the patterns as numpy and tensor arrays, but I don't think they are going to work for deep learning input so need help., see waszee repo SignalStudies_RF if interested. The problem I am wrestling with is how to standardize the audio patterns to feed the learning exercise. For example the letter "e" is just a single dit long but the letter "b" is dah dit dit dit long and common pattrens include stings like "CQ CQ de ....". The numpy files I built have a variable length string of sound pulses that depend on the pulse pattern and the sending speed. The audio chirp example seems to say we can make the sounds same length, I wonder if anybody has suggestions as to how to handle the vairable lengths to make common array images for all of the patterns of sound needed to decode the sounds. I suspect the natural language guys have dealt with the issue as vocal words have different lengths but I did not see any references here or in the text as to where to look. Please offer suggestions as to how to handle the embedding? Still learning how to tag targets to the patterns too. I am very new to PyTorch.

waszee · Answer 1 · Sun Oct 04 2020 09:36:28 GMT+0800 (China Standard Time)

An example of pattern of CQ ABC is shown as an image capture. Like the audio chirp each letter can be saved as a separate pattern but the same pattern will change as in length with sending speed and oscillation with the beat tones.

waszee · Answer 2 · Tue Oct 13 2020 00:58:04 GMT+0800 (China Standard Time)

I should say that the real pattern is more complex as shown in this clip of real code using a spectrogram created with audacity

Thomas Viehmann · Answer 3 · Fri Oct 16 2020 15:03:41 GMT+0800 (China Standard Time)

The typical thing to do is padding. For this, you'd want to group items with similar length.

waszee · Answer 4 · Sat Oct 17 2020 09:17:31 GMT+0800 (China Standard Time)

Dear Thomas Viehmann, Thank you for taking the time to write me, for the advice, and for your book. I am starting chapter 6 now and think you guys have done a great job. I also downloaded your pytorch module for my raspberry pi. Thanks for doing that compile as it was more challenge than I wanted to do. Works on my Pi 4. Actually found your Pi module before I even knew about the DL PyTorch book. I am trying to apply PyTorch to amateur radio applications as a learning exercise in AI. This has been something for a retired chemist to do while being locked down at home due to the covid risk. I have spent most of my time working on how I think the data should be presented. I initially thought I would be working directly with wave files but after monitoring real morse code signals using an audio spectrogram program. e.g. Audacity works, I have decided I should present the data in the frequency domain rather than the analog wave form. This is because the signal is rarely pure single source and the preprocessing can be fast. My latest efforts have been to put together some python scripts to make audio spectrograms and am starting to select a standard window clip to make tensors. I posted some images and my initial python scripts on Github. Here is a clickable link: https://github.com/waszee/SignalStudies_RF if you are interested. I suspect that somebody has already worked out most of my student issues with applying audio. There are many morse code attempts on github. I think a speech translator would likely have similar processing. If you know somebody that might be willing to coach me or give me an occasional hint please let me know. I am thinking a sequence of FFT clips with corresponding english translation might work. I am very much a newbie at the AI stuff and I was not a programmer except as needed to take some data while working. Hope you do not mind the long email and the english. I wonder how well a translator would work on this email…😊 Sincerely, Bill Spencer [waszee on github and KA4ZBN is the ham call] From: Thomas Viehmann <notifications@github.com> Sent: Friday, October 16, 2020 3:04 AM To: deep-learning-with-pytorch/dlwpt-code <dlwpt-code@noreply.github.com> Cc: waszee <wasz2032@outlook.com>; Author <author@noreply.github.com> Subject: Re: [deep-learning-with-pytorch/dlwpt-code] Do sound and audio files have to be the same length? (#25) The typical thing to do is padding. For this, you'd want to group items with similar length. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#25 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ARAKYG4NPNDZK63RDFVPJU3SK7V55ANCNFSM4SDLTLGA>.