jzhao415 / SparkMLCustomLibrary

This is a demo library for Spark ML related project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SparkMLCustomLibrary

This is a demo library for Spark ML related project

The purpose of this library is to demostrate:

  1. retrieve a csv file from S3
  2. create metadata along with data file,
  3. transfer into DataFrame,
  4. function to show visualization/chart in zeppelin
  5. combine with spark ml pipeline (TBD)

Compete Test cases are under /src/test

Build Instruction:

mvn clean install

Usage:

  1. quick retrieve a csv file with file name, using default s3 bucket in code
val preparedData: DataFrame = PrepareDataFromS3().getFileAsDF("table.csv")
  1. retrieve a csv file from a specific bucket, using filename and bucket name
val preparedData: DataFrame = PrepareDataFromS3().setBucket("snowf0xrawdata").getFileAsDF("table.csv")
  1. retrieve a csv file from S3, apply new meta data
val filePackage: FilePackage = PrepareDataFromS3().setBucket("snowf0xrawdata").getFileAsPackage("table.csv")
  1. in zeppelin
val filePackage:FilePackage =PrepareDataFromS3().setBucket("snowf0xrawdata").getFileAsPackage("table.csv")
filePackage.showZeppelinChart()

alt text alt text

  1. in DataBricks alt text

About

This is a demo library for Spark ML related project


Languages

Language:Scala 100.0%