joshdevins / dsp-class

DSP class

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DSP Class

Based on a compressed and higher level version of the Coursera MOOC Audio Signal Processing for Music Applications. We attended 3-days in Barcelona with Prof. Serra and 2-days in Berlin to finish things off.

Lectures

Lecture slides are in sms-tools.

Getting Started

Use the local Vagrantfile to build a VM with sms-tools and essentia installed.

$ vagrant up
$ vagrant ssh

Outline

  • Day 1, June 22
    • basic mathematics
    • Fourier transform (FT)
    • discrete Fourier transform (DFT)
  • Day 2, June 23
    • Fourier transform properties
    • Short-time Fourier transform (STFT)
      • segmentation
      • onset detection
  • Day 3, June 24
    • sinusoidal modeling
    • harmonic models
      • detecting fundamental frequencies
      • leading to mid and high-level features
  • Day 4, August 06
    • review of previous days
    • sinusoidal plus residual modeling
    • stochastic model
    • sound/music description
      • spectral-based audio features
      • description of sound/music events and collections
  • Day 5, August 07
    • cont. of Day 4

Day 1

  • Lectures 1, 2
  • mathematics
  • Fourier transform (FT)
    • actually asking, "does this frequency f exist and if so, what is the amplitude and phase"
    • sample-based frequency range (define range and steps), will find out amplitude and phase if the frequency exists
    • mathematical, theoretical transform but requires infite time and sinsusoidal wave
  • Discrete Fourier transform (DFT)
    • projection of the input signal onto a series of basis functions
    • assumes that the samples you choose are a stable representation of all time
    • discretized FT, requires to choose number of samples in time domain and the series of basis functions, output is the projection into the frequency domain of both amplitude and phase (samples are collapsed and axis is now frequency response)
    • requires exact signal periodicity in the window, avoid this by applying a window function to reduce the impact of cutting a window and making the signal non-periodic (more on windowing tomorrow)
    • is an identity function/transformation - independently of the number of frequency samples taken for the DFT
    • is reversable, with inverse DFT
  • Fast Fourier transform (FFT)
    • an algorithm to compute the DFT and it's inverse
    • complexity is O(N log N)
    • improves over a naive DFT implementation which is O(N^2)
  • Short-time Fourier transform (STFT)
    • like DFT but does not assume that the window or samples chosen are representative of the entire signal
    • more tomorrow
    • STFT calls FFT (implementation of DFT) on a window
  • Nyquist frequency, sampling theory

Day 2

  • Lectures 3, 4
  • properties of the DFT for real signals (not complex)
  • linearity
    • amplitude increase in input signal
    • phase spectrum stays the same
    • magnitude spectrum reflects no changes in frequencies just amplitude
  • shift
    • minor shift in phase (phi)
    • no change in magnitude spectrum
    • very large change in phase spectrum
    • phase spectrum is very sensitive to phase shift
  • symmetry
    • ring buffer is used
    • split the window signal in half after windowing
    • push latter half into first half of new window such that at x=0, y=0
    • symmetric around 0
    • zero phase windowing
  • convolution
    • for frequency filtering in the time domain, we use convolution
    • for frequency filtering in the frequency domain, we don't need convolution, just multiplication
    • tend to do filtering in the frequency domain as it's cheaper
    • filtering, resonance, dampening, reverberation, etc. are all real world examples
    • related to impulse response
    • models of environmental impulse response is a convolution
    • every sample of input gets convulved with the entire impulse response
    • put the impulse response into every point of the input
    • potentially affects both magnitude and phase, or one or the other only
    • convolution in the time domain is the same as multiplication in the frequency domain and vice versa, convolution in the frequency domain is the same as multiplication in the time domain
  • zero-padding
    • for interpolation and smoothing in the frequency domain
    • for small, instantaneous signals/events, zero padding at the end will result in more resolution in the frequency domain without affecting the accuracy of the DFT
    • if we just use more samples, we can "blur" the DFT by including more signals that are not relevant to the event we are interested in analyzing
    • effectively this increases the number of basis functions, and thus the frequency resolution we can obtain
  • Fast Fourier transform
    • O(N^2) to O(N log N)
    • most frequently, the Cooley-Tukey algorithm
    • relies on repetition and reuse of basis functions
  • Short-time Fourier transform (STFT)
    • introduces a notion of a segment or fragment of time of the entire sound
    • adapts from DFT in that it does not assume the window in the DFT is representative of the entire, signal ad infinitum
    • parameters
      • w: analysis window
      • l: frame number
      • H: hop size
    • loop outside DFT to perform the window hopping (with some window overlapping)
    • window overlapping is to ensure we capture the complete sequence of sounds, also considering that windowing is applied to each chunk/window
    • windowing algorithm/signal depends on the application and signal you are dealing with
    • e.g. rectangular window is fine for things like "pure" signals that are single sinusoids, like radar; not good for general audio as there is much "cutoff" noise around the main lobe
    • "FFT size" (window after zero-padding) must always be a power of 2
    • window is applied before zero-padding, which is before STFT
  • properties and parameters of STFT
  • window size is the most important and has the most effect
  • window types/algorithms each have well defined main lobe widths, independent of window size
  • window size must be odd in order to ensure that the phase is flat for each main lobe in the frequency domain
  • DFT always has the tradeoff of window size/number of samples/frequency resolution and time resolution -- frequency and time resolution are always at odds
  • less time resolution means you cannot capture things like instrument attack (e.g. very start of striking a drum membrane)

Day 3

  • Lectures 5, 6
  • sinusoidal model
  • spectrum sinewave
  • sinewaves as spectral peaks
  • for finding relevant sinusoids in a signal
  • just STFT is often not enough
  • relies on instantaneous measurement of amplitude and frequency (time indexed)
  • real signals are time-variant, periodic signals
  • ? use the sinusoidal model after DFT, to allow identifying arbitrary fundamental frequencies
  • DFT is a lossless transformation, sinusoidal model is not
  • with DFT, you need to represent a whole spectrum of frequencies, even when the input signal is not that complex
  • sinusoidal model captures the main perceptual elements of the input signal
  • we lose mathematical identity since we lose information
  • to get a good frequency resolution, you need zero-padding plus parabolic interpolation to find the "more true" component frequencies
  • harmonic model is next level
  • sinusoids-partials-harmonics
  • poluphonic vs monophonic signals
  • harmonic detection
  • harmonic model system
  • harmonic model is very close to sinusoidal model, but frequencies are not an arbitrary frequencies (as in sinusoidal model) but are integer multiples of the fundamental frequency
  • harmonic model needs monophonic signal or separation/segmentation of polyphonic signal into parts
  • finding fundamental frequency
  • in a monophonic signal
    • in the time domain, the period (where signal repeats) is the fundamental frequency
    • in the frequency domain, the fundamental frequency is the main lobe that explains the others (lowest such that most other lobes are a multiple of it)
    • use autocorrelation in the time domain (signal over time) to find period
  • in a polyphonic signal
    • in the time domain, not possible
    • in the frequency domain, it's feasible but needs multiple frames
  • current most common/popular for finding fundamental frequency (in the time domain primarily) is the YIN algorithm
  • Melody Extraction from Polyphonic Music Signals using Pitch Contour Characteristics. Salamon and Gómez, 2012

Day 4

  • Lectures 7, 8
  • review of the last few days
  • frequency is more stable, phase is more affected by things like noise
  • stochastic model
    • stochastic signals
    • some signals are best described with probabilistic/statistical models
    • autocorrelation, power spectral density
    • approximation: LPC, envelope approximation
  • sinusoidal/harmonic plus residual models (SpR, HpR)
  • residual subtraction
  • harmonic plus residual system
  • sinusoidal/harmonic plus stochastic model (SpS, HpS)
  • stochastic model of residual
  • harmonic plus stochastic system
  • transformations of audio based on models we now know
  • to determine the frame size given a signal and window type: (sampling freq / distance between freq resolution needed) * window type lobe width

Day 5

  • Lectures 9, 10
  • spectral-based audio features
  • introduction to audio features
  • single-frame spectral features
  • multi-frame spectral features
  • usually always contains ZCR and loudness features
  • spectral centroid == spectral weighted arithmetic mean
  • use spectral centroid and loudness together to make sense of silence
  • silence (noise) as high spectral centroid so needs to be combined with loudness
  • MFCC 0 is roughly the same as loudness feature
  • features from MFCC should always include derivatives between frames as MFCC is instantaneous and derivative captures the change between frames
  • music information plane
  • description of sounds and music, recordings and collections

About

DSP class