zellerin / la-tools

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Redoing ML exercises in CL

Some time ago, I took a course on ML with examples in Octave/Matlab. This repository tries to redo the stuff in Common Lisp.

Linear algebra

I need linear combinations and matrix (including row and column vector) multiplication. I met quite a few LA libraries, some native, some using FFI, and most of them is a bit too complicated. Also, I wanted to get a feeling on speed of simple code.

So here is a simple implementation of some LA operations. The macro with-matrixes takes an matrix algebra expression such as (* A X) and generates code to prepare the matrix.

Speed test

This is speed test with current machine (HP notebook), sbcl, and safety level in the code 2
 (let ((a (make-array `(,rows ,across) :element-type 'single-float
					:initial-element 0s0))
	(b (make-array `(,across cols) :element-type 'single-float
				       :initial-element 0s0)))

    (time-in-ms-as-real (regression::with-matrixes (* a b) :declarations nil)))

The speeds for individual table types and sizes are like this on my machine:

rowsacrosscolstime (ms)per-ms
500500500366.0341530
1001001001.01000000
20020020015.0533333
400400103.0533333
400400205.0640000
400400409.0711111
40040010025.0640000
40040020052.0615385
400400400102.0627451
4004001000264.0606061
4004002000583.0548885

It appears to provide relatively stable number of (rows*cols*across) field in the tested ranges for floats.

Linear regression in one variable

This is exercise 1 of the course. Gradient descent method is used to minimise cost function for linear regression.

FunctionDocstring (first line)
linear-estimateLinear regression estimate given coefficients A and independent variables X.
linear-grad-AGradient of error of linear cost function.
linear-regression-iterationUpdate matrix with linear regression coeficients to better match observed data.
linear-regression-iterationsRun COUNT linear iterations, optionally logging error(s).

Key function is linear-regression-iterations that has docstring

Run count linear iterations, optionally logging cost function.

:

The cost function is 1/m Σ ‖y-yʹ‖₂ + ½αΣ‖A‖₂, where =m= is the batch
size, =α= is a regularization parameter, ‖u‖₂is sum of squares of the
elements of matrix u (i.e., =Tr uᵀu=), and =σ= determines speed of
gradient descent (higher is better until it starts to oscilate).

Example 1

The goal is a linear regression of data in file ex1data1.txt that is part of the exercise and can be found on-line

Following lisp source block generates coefficients k and q for best fit.

(with-open-file (out "ex1.txt" :direction :output :if-exists :supersede)
(multiple-value-call #'get-coefficients #'linear-updater (read-comma-file datafile) :out out
   :sigma 0.1 :alpha 0.1))

The coefficients are used by the gnuplot to draw the line agains data points.

set title "Training data with a linear fit"
set yrange [*:*]
set xrange [*:*]
set xlabel "Population (in 10 000)"
set ylabel "Profit (in 10 000 USD)"
set key box linestyle -1 left top
plot file using "%lf,%lf\n" title "Training data",\
   q+k*x title "Linear regression"

ex1data1.svg

Multiple variable LR

(multiple-value-call #'get-coefficients #'linear-updater
 (read-comma-file file) :sigma 1.25e-2)

The coefficients are used by the gnuplot to draw the line agains data points.

set xlabel "Size"
set ylabel "Rooms"
set zlabel "Cost"
set view 110,15
set key bo
x linestyle -1 left top
splot file using "%lf,%lf,%lf\n" title "Training data",\
   q+k1*x+k2*y title "Linear regression"

ex1data2.svg

Logistic

(with-open-file (out "lrs.txt" :direction :output :if-exists :supersede)
  (multiple-value-call #'get-coefficients #'logistic-updater (read-comma-file2 file)
    :sigma 0.99 :alpha 0.1 :out out))

The coefficients are used by the gnuplot to draw the line agains data points.

set key box linestyle -1 right top
set title "Training data with decision boundary"
set xlabel "Exam 1 score"
set ylabel "Exam 2 score"
set yrange [*:*]
set xrange [*:*]
plot file using 1:($3 == 1 ? $2 : 1/0) "%lf,%lf,%lf\n" title "Admitted",\
   file using 1:($3 == 0 ? $2 : 1/0) "%lf,%lf,%lf\n" title "Not admitted", \
   (-q-k1*x-k3/x)/k2 title "Boundary"

ex2data1.svg Convergency graph:

set yrange [0:*]
set xrange [0:*]
set xlabel "Iteration"
set ylabel "Normalized error costs"
set title "Cost after iterations"
set key box linestyle -1 right top
plot file u 1 w lines title "Error cost", \
  file u 2 w lines title "A² cost", \
  file u 3 w lines title "Total cost"

lrs.svg

Speed of regression

Before trying to speed up the regression, lets us measure how long it takes and how much it conses.

(with-output-to-string (*trace-output*)
  (time
   (multiple-value-call #'get-coefficients #'linear-updater
     (read-comma-file2 file)
     :sigma 1s0 :alpha 0.0001)))

Testing sigma - linear

Generate file with errors of the regression for different sigmas.

Testing sigma values

Generate file with errors of the regression for different sigmas.
 (with-open-file (out tracefile :direction :output :if-exists :supersede)
   (dolist (sigma '(0.01 0.03 0.1 0.3))
     (format out "~3&sigma=~a~%" sigma)
     (multiple-value-call #'get-coefficients (symbol-function updater) (read-comma-file2 file)
	:sigma sigma :alpha 0.1 :out out :tracing 20)))
set yrange [0:*]
set xrange [*:*]
set xlabel "Iterations"
set ylabel "Cost value"
set key box linestyle -1 left bottom columnheader
plot for [IDX=0:4] tracefile i IDX u 1:2 w lines title columnheader(1)

err.svg

set yrange [0:*]
set xrange [*:*]
set title "Total cost function value after iterations"
set xlabel "Iterations"
set ylabel "Cost value"
set key box linestyle -1 left bottom
plot for [IDX=0:4] tracefile i IDX u 1:4 w lines title columnheader(1)

err-both.svg

Emacs/Org notes

Some employed in this file

  • Use org-table functions to get docstring of the Lisp functions and fill in multiplication speed
  • Gnuplot technique to plot several data parts of file is new to me

BUGS/next steps

  • [ ] Do not regularize A_0 (why?)
  • [ ] Load images
  • [ ] Write images
  • [ ] l1 decay function

Emacs/Org notes

Some techniques employed in this file:

  • Use org-table functions to get docstring of the Lisp functions and fill in multiplication speed
  • Gnuplot technique to plot several data parts of file is new to me
set yrange [*:*]
set xrange [*:*]
set contour
set view map
set cntrparam levels discrete 0
set isosamples 9,9
splot x*x+100*y*y-900, \
      file using 1:($3 == 0 ? $2 : 1/0) "%lf,%lf,%lf\n" title "Not admitted", \
unset contour

About


Languages

Language:Common Lisp 100.0%