colin-guyon / rpi-audio-levels

Python binding allowing to retrieve audio levels by frequency bands given audio samples (power spectrum in fact), on a raspberry pi, using GPU FFT

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

audio_levels.compute returning -inf values

Afr0king opened this issue · comments

Hello, I love the idea of being able to program data analysis etc. in python while using GPUI_FFT in c. Great work of implementing the cython bridge! It makes showcases with the raspberry much easier!

My problem:

I am currently trying to fft some single channel .wav data (48000Hz Sample Rate, 0.2s lenght) with rpi-audio-levels, only analyzing frequencies between 500Hz and 3kHz. When I do a spectrum analysis with audacity of the file I get the follwing spectrum:
image
However, when I try to analyse it with audio_levles.compute the same file with 250 bands á 10Hz will return:
This for Samplesize of 1024:
image
This for Samplesize of 2048:
image
This for Samplesize of 4096
image

It should be easy to distinguish that these spectrums do not look like one another.
Also there should not be any empty fields. However, the levels vector does contain random values or sometimes the Value -inf or for its spectrum. Am I misunderstanding something?

The code snippets:

Main function:

from rpi_audio_levels import AudioLevels #GPU FFT - Cython bridge by colin guyon
import CustomFunctions as sf
import numpy as np

#2**DATA_SIZE equals Sample size
DATA_SIZE = 10
# Number of Frequency Bands
BANDS_COUNT = 250

# Init
audio_levels = AudioLevels(DATA_SIZE, BANDS_COUNT)

#Frequencies.
FREQ_START = 500 #How many Hz is minimum of frequency Range?
FREQ_STOP = 3000 #How many Hz is maximum of frequency Range?

#Generate band indexes:
indexes = sf.calculate_bands(FREQ_START,FREQ_STOP, BANDS_COUNT)

#load wav file into numpy array
data = sf.get_c0_data(TESTFILE)
data = np.float32(data)

#FFT through data
levels, means, stds = audio_levels.compute(data, indexes)

sf.calculate_bands:

def calculate_bands(minFrequency, maxFrequency, bandCount):
    #Calculate Band width
    band_width = (maxFrequency-minFrequency)/bandCount

    #Generate band indexes:
    band_current = minFrequency
    band_next = band_current + band_width
    
    indexes=[]
    for ctr in range(bandCount):
        #print(ctr)
        buffer = (band_current, band_next)
        indexes.append(buffer)
        band_current = band_next
        band_next = band_next + band_width
    return indexes

sf.get_c0_data

import numpy as np
import wave
def get_c0_data(filename):
     '''
     Read date of first Channel from the provided wavefile
     '''
     #Read Data
     wav = wave.open(filename)
     nch   = wav.getnchannels()
     depth = wav.getsampwidth()
     wav.setpos(0)
     sdata = wav.readframes(wav.getnframes())
     
     # Extract channel data (24-bit data not supported)
     typ = { 1: np.uint8, 2: np.uint16, 4: np.uint32 }.get(depth)
     if not typ:
         raise ValueError("sample width {} not supported".format(depth))
     data = np.fromstring(sdata, dtype=typ)
     ch_data = data[0::nch]
     return ch_data

further information

So I varied every parameter which audio_levels.compute allows for: different frequencies, different audio files and always I get levels and means-arrays with rubbish as content. Has anyone got any Idea what may cause this? Spent now over 15 Hours iterating and retrying with no luck at all. Did I critically misunderstand what kind of Data have to be fed into the FFT?

HI, thank you and sorry for the delay.
My usage is a bit different as I use it to make light beats while a music file is being played, and I don't have to display a full power spectrum.
The main difference that I can quickly see is that I only need a few bands (6 at maximum), far less than 250.
Maybe you could try with less bands, if not already done, just to check if what you get is more consistent with the spectrum computed by audacity?
I don't recall having -inf values.
I could try with one of your .wav file to see if I get the same result... but I don't have a Raspberry Pi where I am currently.

Meanwhile, here is my current code using rpi_audio_levels, maybe it could be useful for you to compare things with your code:
(don't pay attention to the imports of wake_pi_up, this is my alarm clock project)

# -*- coding: utf-8 -*-
"""
Parts of the code are inspired from the lightshowpi project:
    https://bitbucket.org/togiles/lightshowpi/

Third party dependencies:

alsaaudio: for audio input/output
    http://pyalsaaudio.sourceforge.net/

decoder.py: decoding mp3, ogg, wma, ...
    https://pypi.python.org/pypi/decoder.py/1.5XB

numpy: for FFT processing
    http://www.numpy.org/

GPU FFT: for GPU FFT processing
     http://www.aholme.co.uk/GPU_FFT/Main.htm
"""
from __future__ import absolute_import
import wave
import alsaaudio as aa
import numpy as np

rfft = np.fft.rfft
log10 = np.log10
frombuffer = np.frombuffer
hanning = np.hanning
np_sum = np.sum
np_multiply = np.multiply
np_abs = np.abs
np_delete = np.delete
int16 = np.int16
float32 = np.float32

# Use a multiple of 8
# 4096 uses less cpu than 2048, but light beats are less accurate
CHUNK_SIZE = 2048

CHANNEL_LENGTH = 6

USE_GPU = True  # optimize computing using the GPU FFT library and cython/c


def calculate_channel_frequency(min_frequency, max_frequency):
    """Calculate frequency values for each channel"""

    print("Calculating frequencies for %d channels." % CHANNEL_LENGTH)
    octaves = (np.log(max_frequency / min_frequency)) / np.log(2)
    print("octaves in selected frequency range ... %s" % octaves)
    octaves_per_channel = octaves / CHANNEL_LENGTH
    frequency_limits = []
    frequency_store = []

    frequency_limits.append(min_frequency)

    for i in xrange(1, CHANNEL_LENGTH + 1):
        frequency_limits.append(frequency_limits[-1]
                                * 10 ** (3 / (10 * (1 / octaves_per_channel))))
    for i in xrange(CHANNEL_LENGTH):
        frequency_store.append((frequency_limits[i], frequency_limits[i + 1]))
        print("channel %d is %6.2f to %6.2f " % (i, frequency_limits[i],
                                                 frequency_limits[i + 1]))
    return frequency_store


def piff(val, sample_rate):
    """Return the power array index corresponding to a particular frequency."""
    return int(CHUNK_SIZE * val / sample_rate)


range_channels = range(CHANNEL_LENGTH)
min_frequency = 20
max_frequency = 19500
frequency_limits = calculate_channel_frequency(min_frequency, max_frequency)
freqs_left = [CHUNK_SIZE * frequency_limits[i][0] for i in range_channels]
freqs_right = [CHUNK_SIZE * frequency_limits[i][1] for i in range_channels]

# will store the frequency bands indexes
bands_indexes_cache = {}

hanning_cache = np.array(hanning(CHUNK_SIZE), dtype=float32)

if USE_GPU:
    # Use the GPU FFT lib, with cython/c
    gpu_audio_levels = None

    def prepare():
        global gpu_audio_levels
        if gpu_audio_levels is not None:
            import wake_pi_up
            wake_pi_up.log.error("gpu_audio_levels already initialized!")
        from rpi_audio_levels import AudioLevels
        size = 11
        assert 2 ** size == CHUNK_SIZE
        gpu_audio_levels = AudioLevels(size, CHANNEL_LENGTH)

    def release():
        global gpu_audio_levels
        if gpu_audio_levels is None:
            import wake_pi_up
            wake_pi_up.log.error("gpu_audio_levels not initialized!")
        gpu_audio_levels = None  # deallocation of the object must
                                 # release underlying resources
    prepare()
else:
    # else we use only Numpy
    def prepare():
        pass
    def release():
        pass

data_float = np.empty(CHUNK_SIZE, dtype=float32)


# @profile
def calculate_levels(data, buffer_data, sample_rate, bands=None):
    '''Calculate frequency response for each channel

    Initial FFT code inspired from the code posted here:
    http://www.raspberrypi.org/phpBB3/viewtopic.php?t=35838&p=454041

    Optimizations from work by Scott Driscoll:
    http://www.instructables.com/id/Raspberry-Pi-Spectrum-Analyzer-with-RGB-LED-Strip-/

    :param bands: list allowing to choose which bands to process
    :type bands: `list` of `bool`
    '''
    if len(data) != 2 * CHUNK_SIZE:
        print("len(data) != 2 * CHUNK_SIZE : %d != 2 * %d" % (len(data),
                                                              CHUNK_SIZE))
        # can be the case at the last audio chunk, let's ignore it
        levels = [0 for i in range_channels]
        return levels, levels, levels

    # create a numpy array from the data buffer
    # buffer_data = frombuffer(data, dtype=int16)

    # data has one channel and 2 bytes per channel
    # np.empty(len(data) / 2, dtype=float32)
    # data_float[:] = buffer_data[:]
    # data = buffer_data

    # if you take an FFT of a chunk of audio, the edges will look like
    # super high frequency cutoffs. Applying a window tapers the edges
    # of each end of the chunk down to zero.
    np_multiply(buffer_data, hanning_cache, out=data_float)

    try:
        bands_indexes = bands_indexes_cache[sample_rate]
    except KeyError:
        bands_indexes = bands_indexes_cache[sample_rate] = \
            [(int(freqs_left[i] / sample_rate),
              int(freqs_right[i] / sample_rate)) for i in range_channels]

    # Apply FFT - real data
    if USE_GPU:
        # all is done in C using the GPU_FFT lib, it's 7 times faster
        levels, means, stds = gpu_audio_levels.compute(data_float, bands_indexes)
        # TODO: use optional bands to avoid computing some levels for nothing
        return levels, means, stds
    else:
        fourier = rfft(data_float)
        # Remove last element in array to make it the same size as CHUNK_SIZE
        # np_delete(fourier, len(fourier) - 1)
        fourier = fourier[:-1]

        # Calculate the power spectrum
        power = np_abs(fourier) ** 2

        # take the log10 of the resulting sum to approximate how human
        # ears perceive sound levels
        if bands is None:
            # calculate for all frequency bands
            levels = [log10(np_sum(power[bands_indexes[i][0]:bands_indexes[i][1]]))
                      for i in range_channels]
        else:
            # some frequency band indexes are specified, we don't need all bands
            levels = [log10(np_sum(power[bands_indexes[i][0]:bands_indexes[i][1]]))
                      if needed else None
                      for i, needed in enumerate(bands)]
        return levels


if __name__ == "__main__":
    # @profile
    def test():
        import sys
        path = sys.argv[1]

        if path.endswith('.wav'):
            musicfile = wave.open(path, 'r')
        else:
            import decoder
            musicfile = decoder.open(path, force_mono=True)

        sample_rate = musicfile.getframerate()
        print("params: %s" % (musicfile.getparams(),))
        total_seconds = musicfile.getnframes() / musicfile.getframerate()
        total_minutes = total_seconds // 60
        print("duration: %s:%s" % (total_minutes, total_seconds % 60))
        output = aa.PCM(aa.PCM_PLAYBACK, aa.PCM_NORMAL)

        output.setchannels(1)  # mono
        output.setrate(sample_rate)
        output.setformat(aa.PCM_FORMAT_S16_LE)
        output.setperiodsize(CHUNK_SIZE)

        # Output a bit about what we're about to play
        print("Playing: " + path + " ("
              + str(musicfile.getnframes() / sample_rate) + " sec)")

        # read the first chunk of audio data
        data = musicfile.readframes(CHUNK_SIZE)

        while data != '':
            # play the chunk of music
            output.write(data)

            # read the first chunk of audio data
            data = musicfile.readframes(CHUNK_SIZE)

            # get a numpy array from the raw audio buffer data
            buffer_data = frombuffer(data, dtype=int16)

            # Compute FFT in this chunk
            levels = calculate_levels(data, buffer_data, sample_rate,
                                      bands=None)

    test()

(also please have a look at #4 if not already done as it may be useful too)

Thank you for the response. Yes I had a lot of inspiration from the other issue and think what I want to do should be possible ( since he got 4096 Bands and I Use only 250).

I retried changed my plot command to use bars and retried with the same file and only 10 Bands. This is the result:
image

So I think this is not the issue.
The fileSample for the current Test is a 0.4s long wav file of a whistle:
test.wav - Dropbox

I'm setting up a new installation on my Raspi zero W to rule out any external issues

Edit: I set up a new installation of Raspian and still get the exact same issues. At least the code is consistent ..

So I worked the bit on it and got why the -inf values are appearing: I'm kinda dumb.. The max Frequency for audio_levels ist 2^DATA_SIZE. So whe I try to run a fft with 2^10 sample size from 500 to 3000Hz it won't work. Being able to read helps..

However, this does not fix the rubbish Value issue..
I verified my data vector with np.fft.rfftfreq to see if maybe my representation in python is wrong.

Sample analysed with Audacity
image

Same sample analysed with numpy.fft.rfftfreq(sorry for not formatting the axis):
image

Same sample run through audio_levels (2^12 Samples)
image

Same sample run through audio_levels (2^13 Samples)
image

All images are fed with data which was multiplied through a Hanning window. So I'm still trying to find out why the fft is returning such rubbish.