In this project, we will learn how to use Amazon SageMaker to build, train, and deploy a machine learning (ML) model using the XGBoost ML algorithm. Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly.
In this project, we'll learn how to:
- Create a SageMaker notebook instance
- Prepare the data
- Train the model to learn from the data
- Deploy the model
- Evaluate your ML model's performance
Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance. Customers of all sizes and industries can store and protect any amount of data for virtually any use case, such as data lakes, cloud-native applications, and mobile apps. With cost-effective storage classes and easy-to-use management features, you can optimize costs, organize data, and configure fine-tuned access controls to meet specific business, organizational, and compliance requirements.
Amazon SageMaker Studio provides a single, web-based visual interface where you can perform all ML development steps, improving data science team productivity by up to 10x. SageMaker Studio gives you complete access, control, and visibility into each step required to build, train, and deploy models. You can quickly upload data, create new notebooks, train and tune models, move back and forth between steps to adjust experiments, compare results, and deploy models to production all in one place, making you much more productive. All ML development activities including notebooks, experiment management, automatic model creation, debugging, and model and data drift detection can be performed within SageMaker Studio.
- login to
AWS Management Console
- Search for
Amazon Sagemaker
- Go to
Notebook instance
andCreate notebook instance
useany S3 Bucket
- Once the status shows
InService
open aJupyter Notebook
import sagemaker
import boto3 #for accessing s3 bucket
from sagemaker.amazon.amazon_estimator import get_image_uri
from sagemaker.session import s3_input, Session
bucketname = "<bucket name>"
my_region = boto3.session.Session().region_name # set the region of the instance
s3 = boto3.resource('s3')
try:
if my_region=="ap-south-1":
s3.create_bucket(Bucket = bucketname)
print("s3 bucket created successfully")
except Exception as e:
print("S3 error: ", e)
prefix = "xgboost-as-a-built-in-algo"
output_path = 's3://{}/{}/output'.format(bucketname,prefix)
output_path2 = 's3://{}/{}/output'.format("testbucketforassignone",prefix)
Upload the dataset in S3 bucket where the notebook instance's created.
import pandas as pd
import urllib
try:
urllib.request.urlretrieve ("<URL")
print('Success')
except Exception as e:
print('Data load error: ',e)
try:
model_data = pd.read_csv('<FILENAME>',index_col=0)
print('Success: Data loaded into dataframe.')
except Exception as e:
print('Data load error: ',e)
boto3.Session().resource('s3').Bucket("testbucketforassignone").Object(os.path.join(prefix, 'train/train.csv')).upload_file('train.csv')
s3_input_train = sagemaker.TrainingInput(s3_data='s3://{}/{}/train'.format("testbucketforassignone", prefix), content_type='csv')
Same way we can store the test data.
5) Implement the model as shown in Notebook
Following code automatically looks for the XGBoost image URI and builds and XGBoost container
from sagemaker import image_uris
container = sagemaker.image_uris.retrieve("xgboost", boto3.Session().region_name, "1.2-1")
estimator = sagemaker.estimator.Estimator(image_uri=container,
hyperparameters=hyperparameters,
role=sagemaker.get_execution_role(),
instance_count=1,
instance_type='ml.m5.2xlarge',
volume_size=5, # 5 GB
output_path=output_path2,
use_spot_instances=True,
max_run=300,
max_wait=600)
estimator.fit ({'train': s3_input_train,'validation': s3_input_test})
xgb_predictor = estimator.deploy(initial_instance_count=1,instance_type='ml.m4.xlarge')
from sagemaker.predictor import csv_serializer
test_data_array = test_data.drop(['y_no', 'y_yes'], axis=1).values #load the data into an array
xgb_predictor.serializer = csv_serializer # set the serializer type
predictions = xgb_predictor.predict(test_data_array).decode('utf-8') # predict!
predictions_array = np.fromstring(predictions[1:], sep=',') # and turn the prediction into an array
Overall Classification Rate: 89.7%
Predicted No Purchase Purchase
Observed
No Purchase 91% (10785) 34% (151)
Purchase 9% (1124) 66% (297)
This step is necessary to free the resources.
sagemaker.Session().delete_endpoint(xgb_predictor.endpoint)
bucket_to_delete = boto3.resource('s3').Bucket("<Bucket Name>")
bucket_to_delete.objects.all().delete()
MIT License
Model Deployment using AWS Sagemaker
If you have any feedback, please reach out at pradnyapatil671@gmail.com
I am an AI Enthusiast and Data science & ML practitioner