colin-guyon / rpi-audio-levels

Python binding allowing to retrieve audio levels by frequency bands given audio samples (power spectrum in fact), on a raspberry pi, using GPU FFT

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Not an issue, just an implementation question

pguillem opened this issue · comments

Thanks for pointing me to this fork Colin.
As i wrote earlier, i´m currently using the PI and numpy.fft() to compute audio for a school project.

This is what i'm using to get the FFT out of a 44100Hz mic sample.
The current data length is 8192 np.int16 samples (some 0.2 seconds of audio)

def getFFT(data,rate):
    data=data*np.hamming(len(data))
    fft=np.fft.fft(data)
    fft=np.abs(fft)
    fft=10*np.log10(fft) #get decibels
    freq=np.fft.fftfreq(len(fft),1.0/rate)
    return freq[:int(len(freq)/2)],fft[:int(len(fft)/2)] # return X axis and power bands

This code (taken from SWHear.py lib, by Scott Harden) renders some 22k FFT bands.

I kind of understand how to do the same in your GPU_FFT implementation by sending the right data type, however i fail to understand what the "bands_indexes" parameter is telling the function.

How would you implement audio_levels.compute() in order to yield 22k bands ?

Hello, there is indeed a lack of a real usage example here, what I don't have yet, but I can try to explain how to use the lib:

from rpi_audio_levels import AudioLevels
# the length of your audio samples is 8192 = 2**13
DATA_SIZE = 13
# how many frequency bands do you want to retrieve ?
BANDS_COUNT = 3  # (just an example)

audio_levels = AudioLevels(DATA_SIZE, BANDS_COUNT)

The raw computed fft will have a length of 2**(DATA_SIZE - 1).
With BANDS_COUNT = 3 I ask to only retrieve 3 computed values from this fft data.
But band_indexes are a way to specifiy which parts of this array of 4096 fft values I want to take for the resulting 3 bands, instead of splitting the fft in 3 equal parts (which is not what I want if the sound frequencies that are interesting me are only between 100 and 10000Hz for example).

# Each band can have a different width, and you will have to compute the indexes depending on the frequencies that interest you for each band 
bands_indexes = [
        [500, 1000], # start and end idx for the 1st band (min of start idx is 0 here)
        [1000, 2000], # start and end idx for the 2nd band
        [2000, 4000]] # start and end idx for the 3st band (max of end idx is 4096)

Here the indexes come from nowhere, but you need some function to convert the frequency of each band into those indexes within this 4096-witdh fft data.
Such a function is not integrated into this lib, but it may be a good thing to have one in the future.

For example:

def calculate_bands_frequency(min_frequency, max_frequency):
    '''Calculate frequency values for each band'''
    print("Calculating frequencies for %d bands." % BANDS_COUNT)
    octaves = (np.log(max_frequency / min_frequency)) / np.log(2)
    print("octaves in selected frequency range ... %s" % octaves)
    octaves_per_channel = octaves / BANDS_COUNT
    frequency_limits = []
    frequency_store = []

    frequency_limits.append(min_frequency)

    for i in xrange(1, BANDS_COUNT + 1):
        frequency_limits.append(frequency_limits[-1]
                                * 10 ** (3 / (10 * (1 / octaves_per_channel))))
    for i in xrange(BANDS_COUNT):
        frequency_store.append((frequency_limits[i], frequency_limits[i + 1]))
        print("channel %d is %6.2f to %6.2f " % (i, frequency_limits[i],
                                                 frequency_limits[i + 1]))
    return frequency_store

sample_rate = 44100
min_frequency = 20
max_frequency = 20000
frequency_limits = calculate_bands_frequency(min_frequency, max_frequency)
freqs_left = [8192 * frequency_limits[i][0] for i in xrange(BANDS_COUNT)]
freqs_right = [8192 * frequency_limits[i][1] for i in xrange(BANDS_COUNT)]
bands_indexes = [(int(freqs_left[i] / sample_rate), int(freqs_right[i] / sample_rate))
                 for i in xrange(BANDS_COUNT)]

# Then retrieve audio levels each time you have new data
levels = audio_levels.compute(data, bands_indexes)[0]
# data must be a numpy array of 8192 real data with float dtype (np.float32), with only 1 dimmension

I hope this is a little more clear,
you can check the lightshowpi project that has a real usage example of the rpi-audio-levels lib in its https://bitbucket.org/togiles/lightshowpi/src/master/py/fft.py file.

I'm not sure I understand when you say "22k bands", what do you mean ? 22000 frequency bands ?

Thanks for taking the time to share the details and the spirit of the function. It is much clearer to me now. It makes perfect sense for all sorts of applications that will dig into specific bands. (By the way, great work, you are one hell of a programmer).

I´m going to keep testing your code because the speed gain is a must for this project!. I tried it yesterday and no matter what i give as float32, the compute() function is returning a Tuple type with 3 elements on its 3 indexes... maybe i´m doing something wrong?.

Idealy i would expect to get a DATA_SIZE/2 length array with the power values for all bands from 1 to DATA_SIZE/2, which i understand should be the first element of the Tuple returned by compute(), right?

My implementation is an acoustic "pure-tone cancelling device" for a final semester project. Regarding the 22000 bands, thay are required to identify the frequency of the pure tone in space down to 1 decimal precision. (20000 Hz / 22000 Hz)

In order to cancel a pure tone with a "counter-tone", it must match exactly the same frequency, 180 degrees out of phase with the source. Offcourse, in practical terms 16384 bands should suffice.

The raspberry has a C-Media USB card whith a microphone to gather ambient sound on all audible frequencies. When it detects a spike on any band above 60dB, it identifies the frequency and generates a sine wave (counter-tone) and tests every possible phase, in order to create acoustic interference with respect to the source.

Finally it plays the tone with the same amplitude, 180 degrees out of phase, getting to the point where that particular frequency is fully attenuated in a specific portion of space.

I´ll pop the code into github soon, in case you want to check it out.

You're welcome. I've ve been inspired a lot by what I found on internet ;)
Your project seems interesting!

Your are right the first element of the tuple returned by compute() is the DATA_SIZE/2 length array with the power values for all bands.

With an input data length of 2^13, you could only retrieve a maximum of 2^12 fft values, and if you want as much bands as possible, bands_indexes should be:

bands_indexes = [[0,1], [1,2], [2,3], ...., [4095,4096]]
# or
bands_indexes = [(i, i+1) for i in xrange(2**(13-1))]

You would retrieve DATA_SIZE/2 bands. (Did you try with this bands_indexes value ?)

According to http://electronics.stackexchange.com/questions/12407/what-is-the-relation-between-fft-length-and-frequency-resolution , your frequency resolution is 44100 / 2 / 4096 = 5.383... Hz
And you would like a precision lower than 1Hz if I understood well ? So I'm not sure you can get such a frequency resolution with a sample rate of 44100Hz and data of only 8192 samples.
I'm not really an expert of FFT, and my memory would have to be refreshed, so please correct me if I'm wrong.

Thanks for sharing the code of your project, I may have a look into it when I have more time.

Great, i just managed to get 4096 bars REALLY fast.

The program kept reaching a segmentation fault on the second FFT execution, but i realized it was due to my function re-creating the AudioLevels object over and over again.

As long as the code only creates the AudioLevels instance ONLY ONCE,... it works calling compute() again and again.

THANKS!!

cool ;)