In this project, I have tried to solve StumbleUpon Kaggle Problem.
In notebooks/01-stumbleupon-art-of-eda.ipynb
which is very similar to this kaggle notebook
I have done some cool EDA followed by model training and ensembling to achieve a test AUC of 88.44% on kaggle private leaderboard.
Then in notebooks/02-DataPrep-and-training.ipynb
, I have used Amazon Sagemaker to train and deploy a bert-base-uncased
model on stumbleupon dataset from kaggle. Link to the dataset:- https://www.kaggle.com/competitions/stumbleupon/data.
Then in the notebook 03-Create-Lambda.pynb
I have showed how to create AWS Lambda required to host the Sagemaker Endpoint via API Gateway .
Pytorch and hugging face is used for the modeling purpose.
- Create a Sagemaker notebook instance with the instance type as
ml.t2.medium
- Once the Notebook instance is
In Service
, clone this git repo in the Jupyter environment - Run
notebooks/02-DataPrep-and-training.ipynb
notebook to train and deploy the model with Amazon Sagemaker followed by Inference - Refer to
src/train.py
script used for training the model - Run notebook
notebooks/03-Create-Lambda
to create AWS Lambda required to host the Sagemaker Endpoint via API Gateway - Follow this detailed AWS tutorial to invoke lambda function via Amazon API gateway
- Download
flask-api
folder in your local and runflask-api/app.py
and change the variableurl = "<<Amazon API Gateway url link>>"
with your Amazon API Gateway url link to create a flask API. - (Optional) You can follow this medium article to run your Flask API on AWS EC2 instance.