aws / amazon-sagemaker-examples

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.

Home Page:https://sagemaker-examples.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PySparkProcessor - Unable to locate credentials for boto3 call in AppMaster

illinineverdie opened this issue · comments

I am curious if there are issues calling boto3 client from AppMaster when the network is isolated. Or, maybe my network config is off... not sure. I am running pyspark script using the PySparkProcessor (injected into the interface). That script needs to pull objects from s3 as it runs in the AppMaster\Client before working with spark session across the slave workers. I have network isolation turned on, and security groups set.. I have allowed traffic to s3.

When I turn network isolation I get the following.
"raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials"

This traces back to this line of code in my pyspark script I inject.
s3 = boto3.client('s3')

I am using the same role my sagemaker notebook is running in.. that allows me to make these calls to boto3. I simply pass that role to.
networkConfig = NetworkConfig(enable_network_isolation=True,
security_group_ids=[sg_s3_access,
sg_master,
sg_slaves],
subnets=[private_subnet_3])
role = sagemaker.get_execution_role()

spark_processor = PySparkProcessor(base_job_name="some-job",
role=role,
instance_count=2,
instance_type="ml.m5.4xlarge",
max_runtime_in_seconds=2400,
network_config=networkConfig,
image_uri="............dkr.ecr.us-east-1.amazonaws.com/sagemaker-spark-processing:2.4-cpu-py37-v1.0")

All works ok when "enable_network_isolation=False" and I still pass in my networkconfig. Is there a defect in calling boto3 from PySpark script from AppMaster when network isolation is turned on? Or, should I look at my network config again?

When network isolation mode is enabled (enable_network_isolation=True) the processing service will block all network egress from the processing containers (and your Spark application). This means your Spark app will not be able to do direct I/O to S3, and would only be able to do I/O to the local EBS volumes for the job.

For context, the network isolation feature is often used as a one-click solution for strong protection against data exfiltration risks, but is not required to restrict traffic to S3 within your VPC (only the network configuration with your VPC subnet/security groups is required for this).

If you do require network isolation, your input/output data will have to be staged on the job's local EBS volumes by specifying a ProcessingInput (https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.ProcessingInput) and a ProcessingOutput (https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.ProcessingOutput) in your job configuration.