Hydrospheredata / spark-ml-serving

Spark ML Lib serving library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Build Status

Spark-ml-serving

Contextless ML implementation of Spark ML.

Proposal

To serve small ML pipelines there is no need to create SparkContext and use cluster-related features. In this project we made our implementations for ML Transformers. Some of them call context-independent Spark methods.

Structure

Instead of using DataFrames, we implemented simple LocalData class to get rid of SparkContext. All Transformers are rewritten to accept LocalData.

How to use

  1. Import this project as dependency:
scalaVersion := "2.11.8"
// Artifact name is depends of what version of spark are you usng for model training:
// spark 2.0.x
libraryDependencies += Seq(
  "io.hydrosphere" %% "spark-ml-serving-2_0" % "0.3.0",
  "org.apache.spark" %% "spark-mllib" % "2.0.2"
)
// spark 2.1.x
libraryDependencies += Seq(
  "io.hydrosphere" %% "spark-ml-serving-2_1" % "0.3.0",
  "org.apache.spark" %% "spark-mllib" % "2.1.2"
)
// spark 2.2.x
libraryDependencies += Seq(
  "io.hydrosphere" %% "spark-ml-serving-2_2" % "0.3.0",
  "org.apache.spark" %% "spark-mllib" % "2.2.0"

)
  1. Use it: example
import io.hydrosphere.spark_ml_serving._
import LocalPipelineModel._

// ....
val model = LocalPipelineModel.load("PATH_TO_MODEL") // Load
val columns = List(LocalDataColumn("text", Seq("Hello!")))
val localData = LocalData(columns)
val result = model.transform(localData) // Transformed result

More examples of different ML models are in tests.

About

Spark ML Lib serving library

License:Apache License 2.0


Languages

Language:Scala 98.7%Language:Shell 1.3%