kar7hik / mfcc

Common-lisp implementation of MFCC

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Implementation of Mel Frequency Cepstrum Coefficient

MFCC are the most often used features in Automatic Speech Recognition (ASR). This project is a pure common-lisp implementation of MFCC. For more details and to understand the steps involved in the computation of MFCC, refer the doc folder.

Block diagram of MFCC

./data/mfcc.png

Features

  • Read/Write WAV Files
  • Play WAV Files
  • Generate signal using functions
  • Filter bank features
  • MFCC features
  • Normalized MFCC features
  • Delta features
  • Energy

Usage:

Loading:

Clone this repo or extract the downloaded zip file into the quicklisp local-project directory. This project also depends on the cl-dct.

(ql:quickload :mfcc)
(in-package :mfcc)

Record audio:

(record-audio :record-time 2
              :save-to-file t
              :audio-filename "new-mic.wav"
              :plot-data nil
              :plot-filename "record-plot.png")

Play WAV file:

(play-wav-audio "new-mic.wav")

Loading WAV File:

(defparameter *wav-file* "./data/music-mono-1.wav")
(defparameter *wav* (load-wav-file *wav-file*))

Generate WAV file using a function:

(defparameter *wave* (make-instance 'wave-from-function
                                    :duration 1.0 
                                    :frequency 5
                                    :num-channels *num-channels*))
(create-wave *wave*
             :plot-data nil
             :plot-filename "sine-example.png"
             :save-to-file t
             :filename "low-freq.wav")

Generated Signal

./data/sine-example.png

To cut the audio-signal for the specified time (seconds)

  • __*chunk-audio-data*__ returns the snippet of audio signal with the specified time seconds.
;; (defparameter *snip* (chunk-audio-data (load-wav-file *wav-file*) 0.5))
;; (defparameter *sig* (make-signal-processing *snip*
;;                                             (num-channels *wav*)))

Create a signal processing object for pre-processing:

(defparameter *sig* (make-signal-processing (audio-data *wav*)
                                            (num-channels *wav*)))

Creating a MFCC object:

(defparameter *mfcc-obj* (make-mfcc *sig*))

Triangular Windows (Mel-Scale)

./data/mfcc-filter-scale.png

Obtaining Features Values:

(defparameter *lift* (apply-liftering-to-dct *mfcc-obj*))
(defparameter *log-mean* (get-mean-normalization (log-mel-features *mfcc-obj*)))
(defparameter *mfcc-mean* (get-mean-normalization *lift*))
(defparameter *final* (delta *lift* 2))
(defparameter *energy* (find-energy (power-spectrum *sig*)))

Results:

MFCC feature (only the first feature vector):

(aops:sub *lift* 0)
=> #(84.32953643798828d0 2.320774516956553d0 -35.99634746398786d0 
     5.854659999539559d0 -9.431935471432663d0 9.418806709697838d0 
     5.991028051621737d0 12.55219582362723d0 -8.993686128896886d0 
     -29.59149764341289d0 -18.555555851435216d0 1.50505131483078d0 22.858585376021495d0)

Delta features (only the first feature vector):

(aops:sub *final* 0)
=> #(-0.2552299499511719d0 0.6977165031679372d0 -0.09201786919631445d0 
     2.545779231948922d0 -2.7268834033920015d0 2.06742170401215d0 
     -3.5643606642114314d0 4.3841496471528565d0 -1.2499408321360161d0 
     3.7898237565504505d0 -2.156757703371111d0 1.671492236852646d0 -3.8990957484272046d0)

Log features (only the first feature vector):

(aops:sub (log-mel-features *mfcc-obj*) 0)
=> #(12.401644278013046d0 12.325183782025274d0 9.894037918469072d0
     11.87436536355766d0 12.463401597782262d0 13.376237316823566d0
     13.757091339832925d0 11.96858341994982d0 11.816220889334238d0
     11.227166376406652d0 10.129551382254183d0 17.615552923859074d0
     18.078857938546864d0 14.815736271901748d0 14.427964849640757d0
     12.84195386880488d0 13.704143199789389d0 16.635016462483716d0
     14.089993157373893d0 14.462835384938893d0 14.876040837236925d0
     15.794063628793452d0 14.653002044886977d0 15.082592624698782d0
     15.532243769161365d0 14.737328080790803d0 13.913581207693381d0
     13.802102444960566d0 13.896096925050701d0 13.747262284968054d0
     14.433355726798574d0 13.389115891141108d0 12.872882908056825d0
     12.06174263434067d0 12.350199690286363d0 11.429605881258349d0
     11.241366028260343d0 10.990932261487938d0 10.338422068953724d0
     10.299368768592627d0)

Energy

*energy*
=> #(1.7626966062871602d8 1.2568503323757899d8 1.2285623717743166d8
     1.5382597390317196d8 1.4441726597019002d8 1.0788927786367057d8
     8.794667780189571d7 6.599442155411122d7 8.265703456594153d7
     8.253446641971989d7 6.721173258605671d7 5.924728484297371d7 6.6088076672885d7
     3.4179601389461124d8 2.4278710799246743d8 3.044851557071605d8
     1.5618989207849228d8 1.402639283258388d8 1.3936738053985456d8
     1.3918754164092052d8 1.1820257680081546d8 1.149867674831372d8
     7.672032056714347d7 8.300047239509366d7 9.93502845424874d7
     1.0606695789620537d8 1.4963090322287357d8 1.468349587508788d8
     1.5360883554883546d8 1.2264282677580306d8 1.0834977673600417d8
     1.0810930330808245d8 1.2295828897839631d8 1.1636402252604866d8
     1.1088661136503445d8 9.196340631778611d7 8.298800670111749d7
     9.45682838559255d7 1.058083142809799d8 1.0547616672288223d8
     1.5409868265307036d8 1.9798171099728122d8 1.6046782882240316d8
     1.4823352391751128d8 1.4333561448720217d8 1.3090148558446108d8
     1.2771287545208755d8 1.955088602610907d8 1.723331258529756d8
     1.6966751202866793d8 1.89951382940272d8 1.73571919221871d8
     2.1078579605524036d8 2.91207367291081d8 2.925079430050084d8
     9.963834275069143d8 1.2537233543865464d9 7.722182305998327d8
     9.732940572774367d8 1.0590086048419925d9 9.205726874741478d8
     7.719013693707483d8 8.716791584635098d8 9.176891092108614d8
     6.812464057489675d8 5.638014182639217d8 6.439323393014234d8 6.066172279996d8
     6.240344612899727d8 6.094851127844714d8 5.972954042090677d8
     5.542071900532534d8 4.8818300031709176d8 4.0013903496122795d8
     4.440550639597058d8 4.521399851287995d8 5.1030665579791343d8
     1.2298385190357108d9 7.66552346256231d8 6.800212901209595d8
     4.7139651352147526d8 4.43820527628611d8 4.7229934673619246d8
     3.160677533944572d8 3.6487540801100373d8 2.7062509642829174d8
     2.4296254664459792d8 2.4963541764221156d8 2.0572954172603503d8
     1.996456413479171d8 1.7058583680374473d8 1.7355629786245582d8
     1.4626445672763875d8 1.6484551683536372d8 1.6866419891178918d8
     1.5416647842088905d8 1.6862203855037698d8 3.7134247948912156d8
     1.9424501168760452d9) 

Reference:

  1. Spoken Language Processing: A Guide to Theory, Algorithm and System Development
  2. Power and Energy of a Signal
  3. Practical cryptography blog-post
  4. Audio Processing and Speech Recognition Concepts, Techniques and Research Overviews by Soumya Sen, Anjan Dutta, Nilanjan Dey
  5. Speech-processing-for-machine-learning
  6. Archive video about MFCC

About

Common-lisp implementation of MFCC


Languages

Language:Common Lisp 100.0%