shipengcheng1230 / ScalaML_2nd_Edition

Project, source code and data related to the 2nd edition of Scala for machine learning -2017

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Scala for Machine Learning Version 0.99.2 September 7, 2017
Copyright Patrick Nicolas All rights reserved 2013-2017
=================================================================

Overview
Latest releases
Documentation
Minimum requirements
Project
installation
build
Run examples
Persistent models and configurations
Appendix

Note: Versions 0.99.0 and 0.99.1 are related to respectively the original edition (Dec 2014) and revision (Dec 2015) of the book. The source code is available in the ScalaMl repository ScalaML

Overview

The source code provides software developers with a broad overview of the difference in machine learning algorithms. The reader is expected to have a good grasp of the Scala programming language along with some knowledge in basic statistics. Experience in data mining and machine learning is not a pre-requisite.

Source code guidelines are defined in the companion document SourceCodeGuide.html

The examples are related to investment portfolio management and trading strategies. For the readers interested either in mathematics or the techniques implemented in this library, I strongly recommend the following readings:

  • "Machine Learning: A Probabilistic Perspective" K. Murphy - MIT Press - 2012
  • "The Elements of Statistical Learning" T. Hastie, R. Tibshirani, J. Friedman - Springer - 2001
  • "Pattern Recognition and Machine Learning" C. Bishop - Springer 2006
  • "Deep leaning" I. Goodfellow, Y. Bengio, A. Courville - MIT Press - 2017
The real-world examples, related to financial and market analysis, used for the sole purpose of illustrating the machine learning techniques. They do not constitute a recommendation or endorsement of any specific investment management or trading techniques.
The Appendix contains an introduction to the basic concepts of investment and trading strategies as well as technical analysis of financial markets.

Latest release

Here is the list of changes introduced in version 0.99.2 and described in "Scala for Machine Learning - 2nd Edition"

New Chapters

  • Chapter 5 - Dimension Reduction
  • Chapter 8 - Monte Carlo Inference
  • Chapter 11 - Deep Learning
  • Chapter 14 - Multi-armed bandit
  • Chapter 17 - Apache Spark MLlib

Documentation

The best approach to learn about any particular learning algorithm is to
  • Read the appropriate chapter (i.e. Chapter 6: Naive Bayes Classifiers)
  • Review source code guidelines used in the book SourceCodeGuide.html
  • Look at the examples related to the chapter (i.e. org/scalaml/supervised/bayes)
  • Browse through the implementation code (i.e. org/scalaml/supervised/bayes)

Minimum Requirements

Hardware: 2 CPU core with 4 Gbytes RAM for small datasets to build and run examples.
4 CPU Core and 8+ Gbytes RAM for datasets of size 75,000 or larger and/or with 50 features set or larger
Operating system: None
Software: JDK 1.8.0_25 or later, Scala 2.11.2 or later (2.11.8) recommended, and SBT 0.13+ (see installation section for deployment.

Installation and Build

Installation


Eclipse & IntelliJ IDEA The Scala for Machine Learning library is compatible with IntelliJ IDEA 2016 & 2017 and Eclipse Scala IDE 4.0
Specify link to the source in Project/properties/Java Build Path/Source. The two links should be project_name/src/main/scala and project_name/src/test/scala
Add the jars required to build and execute the code within Eclipse Project/properties/Java Build Path/Add External Jarsas declared in the project_name/.classpath
Update the JVM heap parameters in eclipse.ini file as -Xms512m -Xmx8192m or the maximum allowed on your specific machine.

Build

build.sbt

The Simple Build Too (SBT) has to be used to build the library from the source code using the build.sbt file in the root directory
Executing the examples/test in Scala for Machine Learning require sufficient JVM Heap memory (~2G):
in sbt/conf/sbtconfig.text set Xmx to 8192m or higher, -XX:MaxPermSize to 512m or higher i.e. -Xmx8192m -Xms1024m -XX:MaxPermSize=512m

Build script for Scala for Machine Learning:
To build the Scala for Machine Learning library package
$(ROOT)/sbt clean publish-local
To build the package including test and resource files
$(ROOT)/sbt clean package
To run the examples
$(ROOT)/sbt clean test
To generate scala doc for the library
$(ROOT)/sbt doc
To generate scala doc for the examples
$(ROOT)/sbt test:doc
To generate report for compliance to Scala style guide:
$(ROOT)/sbt scalastyle
To compile all examples:
$(ROOT)/sbt test:compile

Maven

A simple pom.xml is available to build the library and execute the test cases:
$(ROOT)/mvn compile to compile the library
$(ROOT)/mvn test to compile and run the examples

Run examples

Note: As the implementation evolves over-time, few test examples may differ from the original test described in the book. The implementation of the algorithm is not expected to change.
Contrary to the first edition. the examples in the book are written as test using Scalatest
$(ROOT)/sbt clean test
or $(ROOT)/mvn test

Persistent models and configurations

The package object org.scalaml.core.Design provide the trait (or skeleton implementation) of the persistent model Design.Model and configuration Design.Config.
The persistency mechanisms is implemented for a couple of supervised learning models only for illustration purpose. The reader should be able to implement the persistency for configuration and models for all relevant learning algorithms using the template operator << and >>

Appendix

The examples have been built and tested with the following libraries:
Java libraries
CRF-Trove_3.0.2.jar
LBFGS.jar
colt.jar
CRF-1.1.jar
commons-math3-3.6.jar
libsvm_sml-3.18.jar
jfreechart-1.0.17/lib/jcommon-1.0.21.jar
jfreechart-1.0.17/lib/jfreechart-1.0.17.jar
junit-4.11.jar
Scala 2.11.8 related libraries
com.typesafe/config/1.2.2/bundles/config.jar
akka-actor_2.11-2.3.8.jar
scalatest_2.2.6.jar
spark-assembly-2.1.0-hadoop2.7.0.jar

===========================================

About

Project, source code and data related to the 2nd edition of Scala for machine learning -2017


Languages

Language:Scala 98.5%Language:HTML 1.5%