audio_levels.compute returning -inf values
Afr0king opened this issue · comments
Hello, I love the idea of being able to program data analysis etc. in python while using GPUI_FFT in c. Great work of implementing the cython bridge! It makes showcases with the raspberry much easier!
My problem:
I am currently trying to fft some single channel .wav data (48000Hz Sample Rate, 0.2s lenght) with rpi-audio-levels, only analyzing frequencies between 500Hz and 3kHz. When I do a spectrum analysis with audacity of the file I get the follwing spectrum:
However, when I try to analyse it with audio_levles.compute the same file with 250 bands á 10Hz will return:
This for Samplesize of 1024:
This for Samplesize of 2048:
This for Samplesize of 4096
It should be easy to distinguish that these spectrums do not look like one another.
Also there should not be any empty fields. However, the levels vector does contain random values or sometimes the Value -inf or for its spectrum. Am I misunderstanding something?
The code snippets:
Main function:
from rpi_audio_levels import AudioLevels #GPU FFT - Cython bridge by colin guyon
import CustomFunctions as sf
import numpy as np
#2**DATA_SIZE equals Sample size
DATA_SIZE = 10
# Number of Frequency Bands
BANDS_COUNT = 250
# Init
audio_levels = AudioLevels(DATA_SIZE, BANDS_COUNT)
#Frequencies.
FREQ_START = 500 #How many Hz is minimum of frequency Range?
FREQ_STOP = 3000 #How many Hz is maximum of frequency Range?
#Generate band indexes:
indexes = sf.calculate_bands(FREQ_START,FREQ_STOP, BANDS_COUNT)
#load wav file into numpy array
data = sf.get_c0_data(TESTFILE)
data = np.float32(data)
#FFT through data
levels, means, stds = audio_levels.compute(data, indexes)
sf.calculate_bands:
def calculate_bands(minFrequency, maxFrequency, bandCount):
#Calculate Band width
band_width = (maxFrequency-minFrequency)/bandCount
#Generate band indexes:
band_current = minFrequency
band_next = band_current + band_width
indexes=[]
for ctr in range(bandCount):
#print(ctr)
buffer = (band_current, band_next)
indexes.append(buffer)
band_current = band_next
band_next = band_next + band_width
return indexes
sf.get_c0_data
import numpy as np
import wave
def get_c0_data(filename):
'''
Read date of first Channel from the provided wavefile
'''
#Read Data
wav = wave.open(filename)
nch = wav.getnchannels()
depth = wav.getsampwidth()
wav.setpos(0)
sdata = wav.readframes(wav.getnframes())
# Extract channel data (24-bit data not supported)
typ = { 1: np.uint8, 2: np.uint16, 4: np.uint32 }.get(depth)
if not typ:
raise ValueError("sample width {} not supported".format(depth))
data = np.fromstring(sdata, dtype=typ)
ch_data = data[0::nch]
return ch_data
further information
So I varied every parameter which audio_levels.compute allows for: different frequencies, different audio files and always I get levels and means-arrays with rubbish as content. Has anyone got any Idea what may cause this? Spent now over 15 Hours iterating and retrying with no luck at all. Did I critically misunderstand what kind of Data have to be fed into the FFT?
HI, thank you and sorry for the delay.
My usage is a bit different as I use it to make light beats while a music file is being played, and I don't have to display a full power spectrum.
The main difference that I can quickly see is that I only need a few bands (6 at maximum), far less than 250.
Maybe you could try with less bands, if not already done, just to check if what you get is more consistent with the spectrum computed by audacity?
I don't recall having -inf
values.
I could try with one of your .wav file to see if I get the same result... but I don't have a Raspberry Pi where I am currently.
Meanwhile, here is my current code using rpi_audio_levels
, maybe it could be useful for you to compare things with your code:
(don't pay attention to the imports of wake_pi_up
, this is my alarm clock project)
# -*- coding: utf-8 -*-
"""
Parts of the code are inspired from the lightshowpi project:
https://bitbucket.org/togiles/lightshowpi/
Third party dependencies:
alsaaudio: for audio input/output
http://pyalsaaudio.sourceforge.net/
decoder.py: decoding mp3, ogg, wma, ...
https://pypi.python.org/pypi/decoder.py/1.5XB
numpy: for FFT processing
http://www.numpy.org/
GPU FFT: for GPU FFT processing
http://www.aholme.co.uk/GPU_FFT/Main.htm
"""
from __future__ import absolute_import
import wave
import alsaaudio as aa
import numpy as np
rfft = np.fft.rfft
log10 = np.log10
frombuffer = np.frombuffer
hanning = np.hanning
np_sum = np.sum
np_multiply = np.multiply
np_abs = np.abs
np_delete = np.delete
int16 = np.int16
float32 = np.float32
# Use a multiple of 8
# 4096 uses less cpu than 2048, but light beats are less accurate
CHUNK_SIZE = 2048
CHANNEL_LENGTH = 6
USE_GPU = True # optimize computing using the GPU FFT library and cython/c
def calculate_channel_frequency(min_frequency, max_frequency):
"""Calculate frequency values for each channel"""
print("Calculating frequencies for %d channels." % CHANNEL_LENGTH)
octaves = (np.log(max_frequency / min_frequency)) / np.log(2)
print("octaves in selected frequency range ... %s" % octaves)
octaves_per_channel = octaves / CHANNEL_LENGTH
frequency_limits = []
frequency_store = []
frequency_limits.append(min_frequency)
for i in xrange(1, CHANNEL_LENGTH + 1):
frequency_limits.append(frequency_limits[-1]
* 10 ** (3 / (10 * (1 / octaves_per_channel))))
for i in xrange(CHANNEL_LENGTH):
frequency_store.append((frequency_limits[i], frequency_limits[i + 1]))
print("channel %d is %6.2f to %6.2f " % (i, frequency_limits[i],
frequency_limits[i + 1]))
return frequency_store
def piff(val, sample_rate):
"""Return the power array index corresponding to a particular frequency."""
return int(CHUNK_SIZE * val / sample_rate)
range_channels = range(CHANNEL_LENGTH)
min_frequency = 20
max_frequency = 19500
frequency_limits = calculate_channel_frequency(min_frequency, max_frequency)
freqs_left = [CHUNK_SIZE * frequency_limits[i][0] for i in range_channels]
freqs_right = [CHUNK_SIZE * frequency_limits[i][1] for i in range_channels]
# will store the frequency bands indexes
bands_indexes_cache = {}
hanning_cache = np.array(hanning(CHUNK_SIZE), dtype=float32)
if USE_GPU:
# Use the GPU FFT lib, with cython/c
gpu_audio_levels = None
def prepare():
global gpu_audio_levels
if gpu_audio_levels is not None:
import wake_pi_up
wake_pi_up.log.error("gpu_audio_levels already initialized!")
from rpi_audio_levels import AudioLevels
size = 11
assert 2 ** size == CHUNK_SIZE
gpu_audio_levels = AudioLevels(size, CHANNEL_LENGTH)
def release():
global gpu_audio_levels
if gpu_audio_levels is None:
import wake_pi_up
wake_pi_up.log.error("gpu_audio_levels not initialized!")
gpu_audio_levels = None # deallocation of the object must
# release underlying resources
prepare()
else:
# else we use only Numpy
def prepare():
pass
def release():
pass
data_float = np.empty(CHUNK_SIZE, dtype=float32)
# @profile
def calculate_levels(data, buffer_data, sample_rate, bands=None):
'''Calculate frequency response for each channel
Initial FFT code inspired from the code posted here:
http://www.raspberrypi.org/phpBB3/viewtopic.php?t=35838&p=454041
Optimizations from work by Scott Driscoll:
http://www.instructables.com/id/Raspberry-Pi-Spectrum-Analyzer-with-RGB-LED-Strip-/
:param bands: list allowing to choose which bands to process
:type bands: `list` of `bool`
'''
if len(data) != 2 * CHUNK_SIZE:
print("len(data) != 2 * CHUNK_SIZE : %d != 2 * %d" % (len(data),
CHUNK_SIZE))
# can be the case at the last audio chunk, let's ignore it
levels = [0 for i in range_channels]
return levels, levels, levels
# create a numpy array from the data buffer
# buffer_data = frombuffer(data, dtype=int16)
# data has one channel and 2 bytes per channel
# np.empty(len(data) / 2, dtype=float32)
# data_float[:] = buffer_data[:]
# data = buffer_data
# if you take an FFT of a chunk of audio, the edges will look like
# super high frequency cutoffs. Applying a window tapers the edges
# of each end of the chunk down to zero.
np_multiply(buffer_data, hanning_cache, out=data_float)
try:
bands_indexes = bands_indexes_cache[sample_rate]
except KeyError:
bands_indexes = bands_indexes_cache[sample_rate] = \
[(int(freqs_left[i] / sample_rate),
int(freqs_right[i] / sample_rate)) for i in range_channels]
# Apply FFT - real data
if USE_GPU:
# all is done in C using the GPU_FFT lib, it's 7 times faster
levels, means, stds = gpu_audio_levels.compute(data_float, bands_indexes)
# TODO: use optional bands to avoid computing some levels for nothing
return levels, means, stds
else:
fourier = rfft(data_float)
# Remove last element in array to make it the same size as CHUNK_SIZE
# np_delete(fourier, len(fourier) - 1)
fourier = fourier[:-1]
# Calculate the power spectrum
power = np_abs(fourier) ** 2
# take the log10 of the resulting sum to approximate how human
# ears perceive sound levels
if bands is None:
# calculate for all frequency bands
levels = [log10(np_sum(power[bands_indexes[i][0]:bands_indexes[i][1]]))
for i in range_channels]
else:
# some frequency band indexes are specified, we don't need all bands
levels = [log10(np_sum(power[bands_indexes[i][0]:bands_indexes[i][1]]))
if needed else None
for i, needed in enumerate(bands)]
return levels
if __name__ == "__main__":
# @profile
def test():
import sys
path = sys.argv[1]
if path.endswith('.wav'):
musicfile = wave.open(path, 'r')
else:
import decoder
musicfile = decoder.open(path, force_mono=True)
sample_rate = musicfile.getframerate()
print("params: %s" % (musicfile.getparams(),))
total_seconds = musicfile.getnframes() / musicfile.getframerate()
total_minutes = total_seconds // 60
print("duration: %s:%s" % (total_minutes, total_seconds % 60))
output = aa.PCM(aa.PCM_PLAYBACK, aa.PCM_NORMAL)
output.setchannels(1) # mono
output.setrate(sample_rate)
output.setformat(aa.PCM_FORMAT_S16_LE)
output.setperiodsize(CHUNK_SIZE)
# Output a bit about what we're about to play
print("Playing: " + path + " ("
+ str(musicfile.getnframes() / sample_rate) + " sec)")
# read the first chunk of audio data
data = musicfile.readframes(CHUNK_SIZE)
while data != '':
# play the chunk of music
output.write(data)
# read the first chunk of audio data
data = musicfile.readframes(CHUNK_SIZE)
# get a numpy array from the raw audio buffer data
buffer_data = frombuffer(data, dtype=int16)
# Compute FFT in this chunk
levels = calculate_levels(data, buffer_data, sample_rate,
bands=None)
test()
(also please have a look at #4 if not already done as it may be useful too)
Thank you for the response. Yes I had a lot of inspiration from the other issue and think what I want to do should be possible ( since he got 4096 Bands and I Use only 250).
I retried changed my plot command to use bars and retried with the same file and only 10 Bands. This is the result:
So I think this is not the issue.
The fileSample for the current Test is a 0.4s long wav file of a whistle:
test.wav - Dropbox
I'm setting up a new installation on my Raspi zero W to rule out any external issues
Edit: I set up a new installation of Raspian and still get the exact same issues. At least the code is consistent ..
So I worked the bit on it and got why the -inf values are appearing: I'm kinda dumb.. The max Frequency for audio_levels ist 2^DATA_SIZE. So whe I try to run a fft with 2^10 sample size from 500 to 3000Hz it won't work. Being able to read helps..
However, this does not fix the rubbish Value issue..
I verified my data vector with np.fft.rfftfreq to see if maybe my representation in python is wrong.
Same sample analysed with numpy.fft.rfftfreq(sorry for not formatting the axis):
Same sample run through audio_levels (2^12 Samples)
Same sample run through audio_levels (2^13 Samples)
All images are fed with data which was multiplied through a Hanning window. So I'm still trying to find out why the fft is returning such rubbish.