The repo is to supplement the youtube video on emr severless.
- Create EMR Notebook Role
- Open IAM and create the IAM role for the EMR notebook using the emr notebook role json
- Attach AmazonElasticMapReduceEditorsRole policy
- Attached AmazonS3FullAccess policy
- Create EMR Servlerless Execution Role
- Open IAM and create the IAM role for the EMR Servlerless Execution using emr serverless role
- Attach policy for permisions
- Create S3 bucket
- Open S3 console
- create S3 bucket to use for the demo
- Create Folder To use in S3 Bucket
- Create a
scripts
folder - Create a
customers
folder (We use this to upload a CSV to) - Create a
query-results
folder - Upload files to folders
Studio Setup
-
Naviagte to EMR home from the AWS Console and select EMR Studio from the left handside.
-
Under
Networking and Security
select your default VPC and 3 public subnets. -
Select the EMR Studio role
emr-notebook-role-tutorial
created duing the Set Up Work stage -
Select the S3 bucket created duing the Set Up Work stage. (This will be your own customer bucket name)
Spark App Setup
10 Select create application
from the top right
-
Enter a name for the application. Leave the type as
Spark
and clickcreate application
-
Name job and select the service role created in the set up steps.
Hive App Setup
-
Name the hive job, select hive script (change bucket name in script),and select service role.
-
Submit Job and monintor. Job status will go from pending -> running -> success.
Johnny Chivers
Enjoy 🤘