atharvas/audio-visualizer

Notes:

All the task's here are geared towards the final goal of making a sound visualizer.

Changes will roll out every week. Be sure to check this webpage often for updates!

Task 1 : Setup

The main idea behind dedicating an entire section to setup is to get you comfortable with interacting with your Raspberry Pi. For many of you, this would be the first time you'd have to interact with a headless computer. A couple of things to keep in mind while doing this task:

What makes a Raspberry Pi different from an Arduino?
What makes a Raspberry Pi similar to an Arduino?
How can a Raspberry Pi be used in embedded system applications?

Task 1.1: Downloading and installing Raspbian

Online References:

Download Raspbian: I'd recommend downloading and installing the full raspbian desktop environment (as opposed to the Lite installation) so that you can connect a display in case the network fails.
Flash it onto an SD Card: The easiest way to do this is to use the Raspberry Pi Imager.
Boot up: Follow along the steps. Make sure to enable SSH, VNC and Remote GPIO.
Wi-fi Setup: If your Pi has wi-fi, you'll need to follow some additional steps.
1. Set Wi-fi country: This can be done by running
```
$ sudo raspi-config
```
  and navigating into Localization Options > Change Wi-fi country. Reboot your Pi for changes to take effect.
2. Register your Pi on IllinoiNet_Guest: To whitelist your Pi, go to the IllinoisNet_Guest management portal and click on Add Device (This may take ~ 10-15 minutes). You might need to find your mac address(es). This can be done using:
```
$ ifconfig -a
```
3. How to connect to IllinoisNet?: IllinoisNet is harder to connect to due to the security standard it uses, which is, WPA Enterprise.
  - Install network manager packages
    - You need network access to be able to download packages, please connect to ethernet/wifi first before moving on.
```
$ sudo apt install network-manager network-manager-gnome
```
  - Disable dhcpcd
```
$ sudo systemctl disable dhcpcd
$ sudo systemctl stop dhcpcd
```
  - Enable NetworkManager
    
    edit /etc/NetworkManager/NetworkManager.conf
```
managed=true
```
  - Add wifi profile for IllinoisNet
    - Create a wifi connection
      $ nm-connection-editor
      Click the + button and fill details
  - Activate IllinoisNet connection
```
$ nmtui
```

Task 1.2: Connecting to Pi over SSH

Online References:

Enable SSH: You can do this by going into raps-config and navigating to Interfacing Options > SSH.
Connect your laptop to IllinoisNet_Guest: You need to be on the same network as the Pi to login via SSH.
Find your Pi's IP address: This can be done using ifconfig and noting down the inet field under wlan0.
Connect to your Pi!: From your laptop, type:
```
$ ssh pi@<IP ADDRESS UNDER IFCONFIG>
```

Task 1.3 (Extra): Running VSCode on the Raspberry Pi

Online References:

https://code.visualstudio.com/docs/remote/remote-overview

Task 1.4 (Extra): VNC

Online References:

Reference Tutorial

Task 1.5: CS225 on your Raspberry Pi!

Git comes pre-installed with Raspbian! Clone your CS225 Repo in any folder and try compiling some code!

Task 1.6 (Extra): Add a CRON job to pull your CS225 Repo every x minutes!

Online References:

Task 2: Setting up the Audio Input

The main goal of this section is to be able to read audio from a device over bluetooth and do some basic processing with it There are 3 steps in this section:

Setting up the bluetooth connection
Routing audio from the bluetooth device to an audio sink.
"Listening" to the audio sink for our audio stream.

Note: We will be using python for most of our computations. The main reason for this (sudden) shift to python is because it lets us use existing libraries to abstract away the specifics for streaming audio inputs.

Task 2.1: Installing dependencies and setting up bluetooth.

Dependencies: We first need to install a bunch of dependences.
```
$ sudo rpi-update
$ sudo apt update
$ sudo apt install bluez pulseaudio-module-bluetooth python-gobject python-gobject-2 bluez-tools udev portaudio19-dev python-pyaudio python3-pyaudio
# python3 dependencies
$ pip3 install -U numpy scipy setuptools
$ pip3 install pyaudio
```
You may encounter errors specific to your raspberry pi model. In that case, google the specific error message (don't forget to add your pi model to the search query!). There is a lot of online discussion to assist with most of these errors.

Connect to bluetooth: The straightforward way to go about this is to plug your raspberry pi into a display (or use VNC!) and pair the pi with your phone using the GUI. However, we can use the terminal to do the same thing!

$ bluetoothctl
[bluetooth] list
# An output should appear representing your bluetooth dongle or the bluetooth module on the Pi 3
[bluetooth] agent on
[bluetooth] default-agent
[bluetooth] discoverable on
[bluetooth] scan on
# The MAC address of your device that you want to pair might be listed. If so, note down the MAC address that is associated with the name of the device you want to pair
[bluetooth] pair XX:XX:XX:XX:XX:XX
[bluetooth] trust XX:XX:XX:XX:XX:XX

Task 2.2: Setting up the audio playback

Online References:

Setup Pi as a bluetooth speaker

Add user to group: The first step is to add your current user to the PulseAudio group. You can get your current user using $ whoami
```
$ sudo usermod -a -G lp pi
```
Add audio configuration file: Create audio.conf at the location /etc/bluetooth/ and paste this into the file. (You'll need sudo access to make changes at this location):
```
[General]:
Enable=Source,Sink,Media,Socket
```
Start pulseaudio: This can be done using this command:
```
$ pulseaudio -D
```
Check playback: Connect a speaker (or a pair of headphones) to the audio jack and try playing something on your phone (or use the HDMI port if your monitor has speakers). If you can't hear anything, you might need to force the audio through the audio jack. This can be done by:
```
$ sudo raspi-config
# Advanced Options -> Audio -> Force 3.5mm  
```

Task 2.3: Reading in audio data

Online References:

Pulseaudio under the hood
PyAudio documentation
Testing input and output stream: Before we proceed, lets make sure our input and output streams work.
```
$ pactl info
```
gives us basic information about our pulseaudio server instance including the Default Sink and the Default Source. To see a list of potential sources/sinks, we can use the command $ pactl list sinks short or $ pactl list sources short. If we want to add bluetooth streams, we need to execute $ pacmd load-module module-bluetooth-discover first (If you're getting a pa_context_connect() failed error, reboot your pi.)

Now, we can use parecord to record the source stream. If paplay gives us the input back, we are good to go.
```
# make sure your bluetooth device is playing something
$ parecord -v /tmp/test.wav
$ paplay -v /tmp/test.wav
# play back is the input stream = good to go
```

Reading in from the source stream:

Reading from a sound stream in PyAudio consists of 3 steps:

Open a new stream in the controller
Read in data from the stream
Close the stream when done.

Your goal for this activity is trying to figure out how to read a sound input using PyAudio. You should be able to complete this activity using only the functions given below (but its okay to use other functions as well!).

Documentation:

The code provided below should help you get started.

import pyaudio
import numpy as np
from numpy.linalg import norm

FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
FRAMES_PER_BUFFER = 4096

controller = pyaudio.PyAudio()

# Solution: Step (1) open stream
stream.open(format = FORMAT, 
            channels = CHANNELS, 
            rate = RATE, 
            input = True,
            frames_per_buffer=FRAMES_PER_BUFFER)

while True:
    try:
      	# Solution: Step (2) read from stream
      	stream_data = stream.read(FRAMES_PER_BUFFER, exception_on_overflow=False)
        data = np.fromstring(stream_data, dtype=np.int16).astype(np.float32)
        print(norm(data))
    except KeyboardInterrupt:
        break

print('\nShutting down')
# Solution: Step(3) close the stream
stream.stop_stream()
stream.close()
controller.terminate()

To run this code, save it as activity_23.py and run:

$ python3 activity_23.py

Task 2.4: Outputting audio data!

Your goal for this activity is trying to figure out how to output sound using PyAudio. You should be able to complete this activity using only the function given below (but its okay to have something else!).

This code should help you get started.

import pyaudio
import numpy as np

rate=44100
def get_bit_stream(rate=44100, lambdah=1879.69, duration=3):
    '''
    Function that returns a bit stream with 5 tones at diff. amplitudes.
    '''
    n_frames = int(rate * duration)
    bit_stream = str()
    for x in range(n_frames):
        val = (1 + int((x * 2 * 5) / n_frames)) * (np.sin(x/((rate/lambdah)/np.pi))*127+128)
        bit_stream = bit_stream+chr(int(val))    
    return bit_stream


bit_stream = get_bit_stream()
# Solution
controller = pyaudio.PyAudio()
stream = p.open(format = p.get_format_from_width(1), 
            channels = 1, 
            rate = rate, 
            output = True)
stream.write(bit_stream)
stream.stop_stream()
stream.close()
controller.terminate()

To run this code, save it as activity_24.py and run:

$ python3 activity_24.py

Task 3: Basic Audio Processing

Now that we know how to read audio data, the next step is to process it. There will be 3 main parts to this step.

Filtering: Apply 1 or more of these filters to a signal:
1. Hamming Filter
2. Low pass filter
3. High pass filter
Sampling: Discrete Fast Fourier Transformation
Normalizing : Logarithmic binning, Applying a Mel-Filterbank

Note: This section will be less hands-on than the other sections. I suggest experimenting with these steps first on your laptop/desktop (Try using a jupyter notebook! Getting started, VSCode, Online) and then moving your code to the raspberry pi (Use scp or git for this!).

Task 3.1: Filtering

Part 1: Activity

Online References (Resources I used to prepare this material):

Your first task is to generate a band pass filter (The combination of a low pass filter and a high pass filter). The main goal is to get an intuitive understanding of what a filter is doing. You should be able to complete this activity using only the functions given below (but its okay to use other functions as well!).

Documentation:

The following code generates this signal that is the summation of 3 sinusoidal wave. Your goal is to filter out the 1Hz and the 100Hz sinusoid so that only the 50Hz signal remains.

import numpy as np
from scipy import signal
import matplotlib.pyplot as plt

def plot_function(v, title="No Title"):
    plt.figure(figsize=(10,5))
    plt.plot(v)
    plt.title(title)
    plt.show()
    
def generate_sin_wave(freq_hz, n_pnts=1000):
    return np.sin(np.linspace(0, 1, (n_pnts)) * 2 * np.pi * freq_hz)

sum_of_3_sin = generate_sin_wave(5) + generate_sin_wave(50) + generate_sin_wave(100)

plot_function(sum_of_3_sin, "3 sin waves with freq: 5, 50, 100")

def filter_low_and_high(signal, order=2):
	# @TODO: implement this.
  

filtered_50_hz_sin = filter_low_and_high(sum_of_3_sin)  
plot_function(filtered_50_hz_sin, "Desired Output")

Part 2: Filtering Audio data

A big problem that arises from the use of an Fourier transform to get the frequencies of a real signal is something called spectral leakage. So, before we take the FFT of a signal, we need to apply a hamming filter to the signal (Read more about how hamming filters help with spectral leakage here and here).

Your goal for this part is to apply a hamming filter to the data that you read in from the PulseAudio stream you made in Task 2 (called input data from now on...). This consists of 3 tasks:

Reading in the input data .
Constructing a hamming window (see documentation of numpy.hamming)
Multiplying each value in the hamming window with the input data (point wise).

Task 3.2: DFT

Online Resources:

A discrete Fourier transform takes an input signal and separates it into its discrete frequency components.

Now that we have prepared our filtered input data, take a 1 dimensional discrete Fourier transform of the same. (look at the numpy.fft documentation) and then plot it using the plot_function defined in the code for Task 3.1. Play around with this! Try to see if you can come up with answers for the following:

How does the DFT change if we add a very high frequency sinusoid to the data?
What would happen if we didn't apply the hamming filter in Task 3.1.2?

The DFT gives us the amplitude and phase of each frequency present in the signal. From this, we can compute an estimation for the Power spectral density which can be calculated using this formula: $$ P(f_k) = \frac{1}{N} |DFT(f_{k})|^{2} $$

Task 3.3: Normalizing the DFT

Yay! Now that we have the frequency decomposition, we can start identifying the components that we might want to visualize. Our final goal (for this task) is to bin the frequencies so that we can map colors to certain frequencies (red for high pitched vocals, violet for bass, etc...).

Frequency Range	Frequency Values
Sub-bass	20 to 60 Hz
Bass	60 to 250 Hz
Low midrange	250 to 500 Hz
Midrange	500 Hz to 2 kHz
Upper midrange	2 to 4 kHz
Presence	4 to 6 kHz
Brilliance	6 to 20 kHz

(table from here)

I'll leave the actual implementation of how to bin the data up to you. I'll use this section to explain two (out of many) ways in which you can go about this.

Logarithmic binning: One of the things that we notice when we take the log-log plot of the power spectrum output (Shown for some sample data in the figure below) is that the frequencies are distributed logarithmically (The number of frequencies in the 0-10Hz interval is half of the number of frequencies in the 10-100Hz and so on). Hence, we can group the frequencies in logarithmically increasing bins. One way of going about this is to split the array on the indices that are the geometric mean of the log intervals. Some pseudo-code for the following:
```
log_scale = generate_log_10_values(0, log(len(signal)))
indices = [geometric_mean(log_scale[i], log_scale[i - 1]) for i from 1 to len(log_scale) - 1]
split_on_indices(signal, indices)
```

Filter using a Mel-frequnecy filterbank:

Some resources to learn this:

The basic intuition behind this is that its easier for humans to differentiate between two signals oscillating at 500Hz and 1000Hz than to differentiate between two signals oscillating at 8000Hz and 8500Hz. Hence, we must filter our frequencies appropriately so that frequencies in the mel scale are binned together. All in all, it boils down to taking a dot product of our current power spectrum approximation and the mel_matrix. The following code will construct the mel_matrix for you:

# implementation adapted from https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html
def hz_to_mel(hz):
    return 2595 * np.log10(1 + (hz / 2) / 700)

def mel_to_hz(mel):
    return 700 * (10**(mel / 2595) - 1)

def get_mel_filtermatrix(n_filters=24, fft_size=512, low_hz=0, high_hz=16000):
    points_in_mel = np.linspace(low, high, n_filters + 2) 
    points_in_hz = mel_to_hz(points_in_mel)
    center_freq = np.floor((size + 1) * points_in_hz / 44100).astype(int)

    freq_to_mel_matrix = np.zeros((n_filters, int(np.floor(size / 2 + 1))))

    for i in range(1, len(center_freq) - 1):
        low_f = int(center_freq[i - 1])
        center_f = int(center_freq[i])
        high_f = int(center_freq[i + 1])

        inc_slope_idx = np.arange(low_f, center_f) # +ve triangle filter slope
        dec_slope_idx = np.arange(center_f, high_f)
        freq_to_mel_matrix[i - 1, inc_slope_idx] = (inc_slope_idx - low_f) / (center_f - low_f)
        freq_to_mel_matrix[i - 1, dec_slope_idx] = (high_f - dec_slope_idx) / (high_f - center_f)

    return freq_to_mel_matrix

mel_matrix = get_mel_filtermatrix(N_fft_bins)

Task 4: Visualization

At this point, we are done with all the heavy lifting. You should make sure that your code runs in less than a second to ensure that we can visualize our audio in real time. Our original goal was to output the bins we created in the last step to the LED strip using the Pi's GPIO pins. However, due to COVID-19, we shall instead use a GUI for the visualization phase. You are welcome to use any software/language/framework in this task, as long as you provide necessary citations, your code/logic is interpretable, and your choice of software can render the GUI in real time. (You may even print the FFT bins to the terminal). I shall be using matplotlib.animation for this. The reason is two-fold:

It has enough documentation that all basic questions can be searched online.
It runs on my Pi 3B+ without too much lag.

If you're controlling your Pi via SSH, you can pass in the -X flag to view the plot/animation generated.

Task 4.1 Bass Visualization (Activity)

Online References

Your goal in this subtask is to map the frequencies that correspond to the "base" tones onto the radius of the circle. The table from Task 3.3 above tells us that bass tones usually lie within the 60 to 250 Hz range. If we extract the frequencies corresponding to the bass tones, then the energy in these frequencies (rudimentally) represents the presence of a beat in the signal. We can extract the frequencies in two ways:

We can use a lowpass filter to filter out any frequency above 250 Hz.
We can take the $n$-point discrete Fourier transform of the audio (after applying a hamming filter) and take the slice of the array that corresponds to the "bass" frequencies. (make sure that $n$ = sampling frequency in the call to np.fft.rfft to use the 60Hz and 250Hz numbers directly. Also, instead of energy, you need to calculate the spectral energy for the fft).

Here is some code to plot a circle in matplotlib to get your started.

import numpy as np
from matplotlib import pyplot as plt
from matplotlib import animation

fig = plt.figure(figsize=(5,5))
ax = plt.axes(xlim=(0, 2), ylim=(0, 2))

base = plt.Circle((1, 1), 0.2, fc='b')

def init():
    ax.add_patch(base)
    return base

def loop(i):
    sample = microphone.get_sample() # <- @TODO replace this line with code to read a sample from the microphone.
    if sum(sample) > 0: 
        calc_base =  0.05 * np.random.rand() # <- @TODO replace this line with processed signal.
        base.set_radius(calc_base)
    return base


anim = animation.FuncAnimation(fig, loop, 
                               init_func=init, 
                               frames=10, 
                               interval=10,
                               blit=True)

plt.show()

Task 4.2 Music Visualization

Online References:

Get creative! We've already implemented (95% of) the tools that we need to make a rudimentary music visualizer. In this sub-task, your goal is to piece together all the different things we have touched on in the past (few) weeks and make something cool! However, there are 2 limitations:

Your visualization must be "real time" (Lag between audio and visualization shouldn't exceed 1.5 second max).
The audio processing must be done in real time (Shouldn't be reading a file with the Fourier transform of every sample of the mp3).

A couple of pointers if you're running into speed issues:

Try reducing the number of frames in FRAMES_PER_BUFFER. FRAMES_PER_BUFFER := SAMPLING_FREQ / FPS. So, to increase the FPS, we need to reduce the frames per buffer.
Try lowering the value of frames and/or interval in the call to animation.FuncAnimation.
Try rendering less objects!

Have fun!

atharvas / audio-visualizer

Notes:

Task 1 : Setup

Task 1.1: Downloading and installing Raspbian

Online References:

Task 1.2: Connecting to Pi over SSH

Online References:

Task 1.3 (Extra): Running VSCode on the Raspberry Pi

Online References:

Task 1.4 (Extra): VNC

Online References:

Task 1.5: CS225 on your Raspberry Pi!

Task 1.6 (Extra): Add a CRON job to pull your CS225 Repo every x minutes!

Online References:

Task 2: Setting up the Audio Input

Task 2.1: Installing dependencies and setting up bluetooth.

Task 2.2: Setting up the audio playback

Task 2.3: Reading in audio data

Task 2.4: Outputting audio data!

Task 3: Basic Audio Processing

Task 3.1: Filtering

Part 1: Activity

Part 2: Filtering Audio data

Task 3.2: DFT

Task 3.3: Normalizing the DFT

Task 4: Visualization

Task 4.1 Bass Visualization (Activity)

Task 4.2 Music Visualization

About

Languages