sVujke / ml-case-study-klrn

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ML Case Study - Predicting Default

Problem Description

The task is to predict the probability of default for the data points in the attached data/dataset.csv where that variable is missing. The solution should contain predictions in a .csv file with two columns, uuid and pd (probability of default==1). Once done, the model should be exposed with an API endpoint on a cloud provider of your choice.

Model

A brief overview of the model and solution can be found in this doc. An overview of how the model was built and evaluated can be found in the notebooks directory.

Prerequisites

  • Docker
  • Docker Compose
  • Terraform

Deployment (AWS)

  1. Create ECR repository on AWS
aws ecr create-repository --repository-name klarna-solution
  1. Set environment variable that points to the ECR registry
export DOCKER_REGISTRY=xyz.dkr.ecr.eu-central-1.amazonaws.com
  1. Push dockerized project to ECR
$(aws ecr get-login --no-include-email --region eu-central-1)

docker-compose build
docker-compose push
  1. Set-up the AWS infrastructure with Terraform
terraform init
terraform apply
  1. Deploy the project on the infrastructure

First, export a variable to which the script will ssh

export EC2_MACHINE=xxx.yyy.zzz.qqq

Then run the deploy script

bash deploy.sh

Querying the endpoint

To query the endpoint with a randomly selected feature set, simply run:

python experiments/send_request_for_random_user.py

About


Languages

Language:Jupyter Notebook 97.0%Language:Python 2.4%Language:HCL 0.4%Language:Shell 0.1%Language:Dockerfile 0.0%