djarpin / strata-sagemaker-spark

Used in the SageMaker Spark tutorial at the 2018 O'Reilly Strata NYC conference

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Amazon SageMaker Spark Strata tutorial at the 2018 O'Reilly Strata NYC Conference

This repository contains supporting material for the Amazon SageMaker tutorial at the 2018 O'Reilly Strata NYC Conference.

Setup

  1. Log into your AWS account
  2. Select EMR from services and create a new cluster:
    1. Go to advanced options
    2. Select Spark and Livy (only)
    3. Click next through the rest (feel free to give a custom name, etc.)
  3. Select SageMaker from services and create a SageMaker notebook instance:
    1. Create a new IAM role with access to any S3 bucket
    2. Use the same VPC as your EMR cluster
    3. Take note of security group
  4. Return to your EMR cluster
    1. Take note of master node private IP address
    2. Click on the security group for your master node
    3. Add an inbound rule for Custom TCP on port 8998 with the notebook security group as the source
  5. Select IAM from services
    1. Select Roles
    2. Select EMR_EC2_DefaultRole
    3. Add AmazonSageMakerFullAccess policy
  6. Open your SageMaker notebook instance and start a new terminal and run:
    1. echo '{"kernel_python_credentials" : {"url": "http://<emr-master-private-ip>:8998/"}, "session_configs": {"executorMemory": "2g","executorCores": 2,"numExecutors":4}}' > ~/.sparkmagic/config.json
    2. curl <emr-master-private-ip>:8998/sessions
    3. git clone https://github.com/djarpin/strata-sagemaker-spark.git

Contents

  • XGBoost with EMR - A light modification of the existing XGBoost example notebook to run in an external EMR cluster.
  • BYO - A modification of the existing custom estimator example notebook to train using a convolutional neural network in a bring your own PyTorch container.

About

Used in the SageMaker Spark tutorial at the 2018 O'Reilly Strata NYC conference

License:Apache License 2.0


Languages

Language:Jupyter Notebook 87.7%Language:Python 10.1%Language:Dockerfile 2.2%