ML Case Study - Predicting Default

Problem Description

The task is to predict the probability of default for the data points in the attached data/dataset.csv where that variable is missing. The solution should contain predictions in a .csv file with two columns, uuid and pd (probability of default==1). Once done, the model should be exposed with an API endpoint on a cloud provider of your choice.

Model

A brief overview of the model and solution can be found in this doc. An overview of how the model was built and evaluated can be found in the notebooks directory.

Prerequisites

Docker
Docker Compose
Terraform

Deployment (AWS)

Create ECR repository on AWS

aws ecr create-repository --repository-name klarna-solution

Set environment variable that points to the ECR registry

export DOCKER_REGISTRY=xyz.dkr.ecr.eu-central-1.amazonaws.com

Push dockerized project to ECR

$(aws ecr get-login --no-include-email --region eu-central-1)

docker-compose build
docker-compose push

Set-up the AWS infrastructure with Terraform

terraform init
terraform apply

Deploy the project on the infrastructure

First, export a variable to which the script will ssh

export EC2_MACHINE=xxx.yyy.zzz.qqq

Then run the deploy script

bash deploy.sh

Querying the endpoint

To query the endpoint with a randomly selected feature set, simply run:

python experiments/send_request_for_random_user.py

About

Languages

Language:Jupyter Notebook 97.0%Language:Python 2.4%Language:HCL 0.4%Language:Shell 0.1%Language:Dockerfile 0.0%