jravinder / DSXL_Workshop_Nov2018

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DSX Local Workshop V1.2.1

In this workshop you will learn how to develop and deploy applications in DSX Local. The workshop has been divided into several stand-alone parts for those who are interested in a specific development tool or deployment task.

This lab is meant to be instructor-led. That is, the instructor will explain the objectives of the DSX capabilities covered in each lab, and demonstrate some of those capabilities at the beginning of each lab.

About this repository

This repository contains several lab subfolders. Some labs include notebooks and data, while others have additional instructions that are located in the Lab Instructions folder.

Prerequisites

  1. Knowledge of analytics. These labs do not teach you the basics of analytics or how to implement analytics in R, Python and SPSS. The purpose of this workshop is to provide hands-on experience with analytics tools and deployment functions in DSX Local.
  2. To run this workshop you need an instance of DSX Local V1.2.1.
  3. The supported browsers are Chrome or Firefox.
  4. Download the DSX_Local_V121_Workshop.zip.

Setting up lab projects in DSX Local

  1. Rename the downloaded DSX_Local_V121_Workshop.zip file and give it a unique name. For example, add your initials. Note: Project names in DSX Local cluster must be unique. When we create a project "from file", the project name is inherited from the file name.
  2. Log in to DSX Local.
  3. Select "New Project" and select "From File".
  4. Browse to the .zip file and click Create. ProjectFromFile.

Lab 1: Build, Save and Test SparkML Models (Jupyter/Python)

  1. Open the project you just created.
  2. Navigate to Assets view, in the Notebooks section open TelcoChurn_SparkML Jupyter notebook. This notebook has been implemented for the Python 2.7 runtime. The version of the runtime is displayed on the top right corner of the notebook. You can verify the runtime by running the first cell in the notebook.
  3. Follow instructions in the notebook.

Lab 2: Create Batch Script and Test Batch Scoring (Python)

  1. You must have completed "Lab 1: Build, Save and Test SparkML Models" before working through this lab.
  2. Navigate the to the Models section of the project and click into the saved Telco_Churn_ML_model.
  3. Click the Batch score tab.
  4. For Input data set, select new_customer_churn_data.csv.
  5. For Output data set, check "Local file" and specify new_customers_scores.csv.
  6. On the top right, click Advanced Settings.
  7. Scroll through the Advanced Setting to see the various options. Click Save to save the default settings.
  8. Click Generate Batch Script. (Note: the batch script can be edited. For example, to perform pre/post processing tasks)

batchscoring

  1. Click Run now. Refresh the browser to see the latest job status (Scroll to the bottom of page to see the job status).
  2. Verify that the new_customers_scores.csv is in the data section of the project.

Lab 3: Create Model Evaluation Script and Test Evaluation (Python)

  1. You must have completed "Lab 1: Build, Save and Test SparkML Models" before working through this lab.

  2. Navigate the to the Models section of the project and click into the saved Telco_Churn_ML_model.

  3. Click the Evaluate tab.

  4. For the scripts inputs, specify these values.
    model_eval

  5. Click Advanced Settings and change the name of the script. For example, you can name it ChurnModelEvalScript. Click Save.

  6. Click Generate evaluation Script.

  7. Click Run now. Refresh the browser to see the latest job status (Scroll to the bottom of page to see the job status).

  8. To see the results of the model evaluation, navigate the to the Models section of the project and click into the model Telco_Churn_ML_model. Scroll down to the Evaluation results section.
    model_eval_results

Optional Exercise: Build and save a new version of the model

  1. Open the Jupyter notebook you have been working with in Lab 1
  2. Go to Step 6: Build the Spark pipeline and the Random Forest model in the notebook
  3. In the code cell, delete the first 4 input variables from the VectorAssembler(). This effectively reduces the number of input columns we will use to build a new version of the model, and hence changes the accuracy of the model. Run the code cell.
  4. Run all the code cells from Step 6 until Step 9, which will save a new version of this model in the repository. Note the scoring endpoint, you will see "2" at the end of the scoring endpoint, indicating that endpoint references version 2 of the model.
  5. Navigate to the Models section of the project and click into the saved Telco_Churn_ML_model. You will see the current accuracy of this version of the model, as well as the accuracy history.

About