hjrnunes / clj-liblinear

A Clojure wrapper for LIBLINEAR, a linear support vector machine library

Home Page:http://keminglabs.com/clj-liblinear/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This is a fork of lynaghk's clj-liblinear, with some minor changes:

  1. support for liblinear-java's bias parameter
  2. regression tests (currently managed somewhat problematically)
  3. support for an additional input format (more memory-efficient in case of nonsparse large datasets): clojure.core.matrix.impl.dataset

Notes:

  1. Currently the regression tests fail for the algorithm of type :multi. This means that the functionality of this case changed with the last changes (the insertion of the IndexedValues type), and possibly a bug was inserted. There is no explanation yet.

                                    | '_ \ 
                                    | | | |
  /                             \   |_| |_| 
 /                               \        
 |      +             /          |        
 |       +     +     /           |      
 |                  /   -        |      ____  _   _         _  _  _      _  _                            
 |     +    +   +  /  -          |     / ___|| | (_)       | |(_)| |__  | |(_) _ __    ___   __ _  _ __  
 |                /        -     |    | |    | | | | _____ | || || '_ \ | || || '_ \  / _ \ / _` || '__| 
 |     +   +     / -   -         |    | |___ | | | ||_____|| || || |_) || || || | | ||  __/| (_| || |    
 |              /                |     \____||_|_/ |       |_||_||_.__/ |_||_||_| |_| \___| \__,_||_|    
 |    +        /   -    -        |             |__/                                                      
 |          + /      -           |    
 \           /   -               /    
  \                             /                                                           

This is a Clojure wrapper around Benedikt Waldvogel's Java port of LIBLINEAR, a linear classifier that can handle problems with millions of instances and features. Essentially, it is a support vector machine optimized for classes that can be separated without projecting into some fancy-pants kernel space.

Install

Add

[clj-liblinear "0.1.0"]

to the :dependencies vector in your projects.clj file.

Examples

Clj-liblinear takes maps as instances:

(use '[clj-liblinear.core :only [train predict]])
(let [train-data (concat
                  (repeatedly 300 #(hash-map :class 0 :f {:x (rand), :y (rand)}))
                  (repeatedly 300 #(hash-map :class 1 :f {:x (- (rand)), :y (- (rand))})))
      model (train
             (map :f train-data)
             (map :class train-data)
             :algorithm :l2l2)]
  
  [(predict model {:x (rand) :y (rand)})
   (predict model {:x (- (rand)) :y (- (rand))})])
;;=> [0 1]

If you are concerned only with occurrences (rather than continuous variables), you can use sets. These will be expanded into indicator variables for classification. For instance, you can easily do simple text classification based on word occurrence:

(use '[clj-liblinear.core :only [train predict]]
     '[clojure.string :only [split lower-case]])

(def facetweets [{:class 0 :text "grr i am so angry at my iphone"}
                 {:class 0 :text "this new movie is terrible"}
                 {:class 0 :text "disappointed that my maximum attention span is 10 seconds"}
                 {:class 0 :text "damn the weather sucks"}

                 {:class 1 :text "sitting in the park in the sun is awesome"}
                 {:class 1 :text "eating a burrito life is super good"}
                 {:class 1 :text "i love weather like this"}
                 {:class 1 :text "great new album from my favorite band"}])

(let [bags-of-words (map #(-> % :text (split #" ") set) facetweets)
      model         (train bags-of-words (map :class facetweets))]
  
  (map #(predict model (into #{} (split % #" ")))
       ["damn it all to hell!"
        "i love everyone"
        "my iphone is super awesome"
        "the weather is terrible this sucks"]))

;; => (0 1 1 0)

Thanks

The National Taiwan University Machine Learning Group for LIBLINEAR, and Benedikt Waldvogel his Java transliteration.

This project is sponsored by Keming Labs, a technical design studio specializing in data visualization.

About

A Clojure wrapper for LIBLINEAR, a linear support vector machine library

http://keminglabs.com/clj-liblinear/

License:Eclipse Public License 1.0


Languages

Language:Clojure 100.0%