Redoing ML exercises in CL

Some time ago, I took a course on ML with examples in Octave/Matlab. This repository tries to redo the stuff in Common Lisp.

Linear algebra

I need linear combinations and matrix (including row and column vector) multiplication. I met quite a few LA libraries, some native, some using FFI, and most of them is a bit too complicated. Also, I wanted to get a feeling on speed of simple code.

So here is a simple implementation of some LA operations. The macro with-matrixes takes an matrix algebra expression such as (* A X) and generates code to prepare the matrix.

Speed test

This is speed test with current machine (HP notebook), sbcl, and safety level in the code 2

 (let ((a (make-array `(,rows ,across) :element-type 'single-float
					:initial-element 0s0))
	(b (make-array `(,across cols) :element-type 'single-float
				       :initial-element 0s0)))

    (time-in-ms-as-real (regression::with-matrixes (* a b) :declarations nil)))

The speeds for individual table types and sizes are like this on my machine:

rows	across	cols	time (ms)	per-ms
500	500	500	366.0	341530
100	100	100	1.0	1000000
200	200	200	15.0	533333
400	400	10	3.0	533333
400	400	20	5.0	640000
400	400	40	9.0	711111
400	400	100	25.0	640000
400	400	200	52.0	615385
400	400	400	102.0	627451
400	400	1000	264.0	606061
400	400	2000	583.0	548885

It appears to provide relatively stable number of (rows*cols*across) field in the tested ranges for floats.

Linear regression in one variable

This is exercise 1 of the course. Gradient descent method is used to minimise cost function for linear regression.

Function	Docstring (first line)
linear-estimate	Linear regression estimate given coefficients A and independent variables X.
linear-grad-A	Gradient of error of linear cost function.
linear-regression-iteration	Update matrix with linear regression coeficients to better match observed data.
linear-regression-iterations	Run COUNT linear iterations, optionally logging error(s).

Key function is linear-regression-iterations that has docstring

Run count linear iterations, optionally logging cost function.

The cost function is 1/m Σ ‖y-yʹ‖₂ + ½αΣ‖A‖₂, where =m= is the batch
size, =α= is a regularization parameter, ‖u‖₂is sum of squares of the
elements of matrix u (i.e., =Tr uᵀu=), and =σ= determines speed of
gradient descent (higher is better until it starts to oscilate).

Example 1

The goal is a linear regression of data in file ex1data1.txt that is part of the exercise and can be found on-line

Following lisp source block generates coefficients k and q for best fit.

(with-open-file (out "ex1.txt" :direction :output :if-exists :supersede)
(multiple-value-call #'get-coefficients #'linear-updater (read-comma-file datafile) :out out
   :sigma 0.1 :alpha 0.1))

The coefficients are used by the gnuplot to draw the line agains data points.

set title "Training data with a linear fit"
set yrange [*:*]
set xrange [*:*]
set xlabel "Population (in 10 000)"
set ylabel "Profit (in 10 000 USD)"
set key box linestyle -1 left top
plot file using "%lf,%lf\n" title "Training data",\
   q+k*x title "Linear regression"

Multiple variable LR

(multiple-value-call #'get-coefficients #'linear-updater
 (read-comma-file file) :sigma 1.25e-2)

The coefficients are used by the gnuplot to draw the line agains data points.

set xlabel "Size"
set ylabel "Rooms"
set zlabel "Cost"
set view 110,15
set key bo
x linestyle -1 left top
splot file using "%lf,%lf,%lf\n" title "Training data",\
   q+k1*x+k2*y title "Linear regression"

Logistic

(with-open-file (out "lrs.txt" :direction :output :if-exists :supersede)
  (multiple-value-call #'get-coefficients #'logistic-updater (read-comma-file2 file)
    :sigma 0.99 :alpha 0.1 :out out))

The coefficients are used by the gnuplot to draw the line agains data points.

set key box linestyle -1 right top
set title "Training data with decision boundary"
set xlabel "Exam 1 score"
set ylabel "Exam 2 score"
set yrange [*:*]
set xrange [*:*]
plot file using 1:($3 == 1 ? $2 : 1/0) "%lf,%lf,%lf\n" title "Admitted",\
   file using 1:($3 == 0 ? $2 : 1/0) "%lf,%lf,%lf\n" title "Not admitted", \
   (-q-k1*x-k3/x)/k2 title "Boundary"

Convergency graph:

set yrange [0:*]
set xrange [0:*]
set xlabel "Iteration"
set ylabel "Normalized error costs"
set title "Cost after iterations"
set key box linestyle -1 right top
plot file u 1 w lines title "Error cost", \
  file u 2 w lines title "A² cost", \
  file u 3 w lines title "Total cost"

Speed of regression

Before trying to speed up the regression, lets us measure how long it takes and how much it conses.

(with-output-to-string (*trace-output*)
  (time
   (multiple-value-call #'get-coefficients #'linear-updater
     (read-comma-file2 file)
     :sigma 1s0 :alpha 0.0001)))

Testing sigma - linear

Generate file with errors of the regression for different sigmas.

Testing sigma values

Generate file with errors of the regression for different sigmas.

 (with-open-file (out tracefile :direction :output :if-exists :supersede)
   (dolist (sigma '(0.01 0.03 0.1 0.3))
     (format out "~3&sigma=~a~%" sigma)
     (multiple-value-call #'get-coefficients (symbol-function updater) (read-comma-file2 file)
	:sigma sigma :alpha 0.1 :out out :tracing 20)))

set yrange [0:*]
set xrange [*:*]
set xlabel "Iterations"
set ylabel "Cost value"
set key box linestyle -1 left bottom columnheader
plot for [IDX=0:4] tracefile i IDX u 1:2 w lines title columnheader(1)

set yrange [0:*]
set xrange [*:*]
set title "Total cost function value after iterations"
set xlabel "Iterations"
set ylabel "Cost value"
set key box linestyle -1 left bottom
plot for [IDX=0:4] tracefile i IDX u 1:4 w lines title columnheader(1)

Emacs/Org notes

Some employed in this file

Use org-table functions to get docstring of the Lisp functions and fill in multiplication speed
Gnuplot technique to plot several data parts of file is new to me

BUGS/next steps

[ ] Do not regularize A_0 (why?)
[ ] Load images
[ ] Write images
[ ] l1 decay function

Emacs/Org notes

Some techniques employed in this file:

Use org-table functions to get docstring of the Lisp functions and fill in multiplication speed
Gnuplot technique to plot several data parts of file is new to me

set yrange [*:*]
set xrange [*:*]
set contour
set view map
set cntrparam levels discrete 0
set isosamples 9,9
splot x*x+100*y*y-900, \
      file using 1:($3 == 0 ? $2 : 1/0) "%lf,%lf,%lf\n" title "Not admitted", \
unset contour

zellerin / la-tools

Redoing ML exercises in CL

Linear algebra

Speed test

Linear regression in one variable

Example 1

Multiple variable LR

Logistic

Speed of regression

Testing sigma - linear

Testing sigma values

Emacs/Org notes

BUGS/next steps

Emacs/Org notes

About

Languages