zxkane / realtime-fraud-detection-with-gnn-on-dgl

An end-to-end solution for real-time fraud detection(leveraging graph database Neptune) using Amazon SageMaker and Deep Graph Library (DGL) to construct a heterogeneous graph from tabular data and train a Graph Neural Network(GNN) model to detect fraudulent transactions in the IEEE-CIS dataset.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Real-time Fraud Detection with Graph Neural Network on DGL

It's a end-to-end solution for real-time fraud detection using graph database Amazon Neptune, Amazon SageMaker and Deep Graph Library (DGL) to construct a heterogeneous graph from tabular data and train a Graph Neural Network(GNN) model to detect fraudulent transactions in the IEEE-CIS Fraud detection dataset.

Architecutre of solution

This solution consists of below stacks,

  • Fraud Detection solution stack
  • nested model training and deployment stack
  • nested real-time fraud detection stack
  • nested transaction dashboard stack

Model training and deployment stack

The model training & deployment pipeline is orchestrated by AWS Step Functions like below graph, model training

Dashboard stack

It creates a React based web portal that observes the recent fraud transactions detected by this solution. This web application also is orchestrated by Amazon CloudFront, AWS Amplify, AWS AppSync, Amazon API Gateway, AWS Step Functions and Amazon DocumentDB. business system

How to train model and deploy inference endpoint

After deploying this solution, go to AWS Step Functions in AWS console, then start the state machine starting with ModelTrainingPipeline.

You can input below parameters to overrride the default parameters of model training,

{
  "trainingJob": {
    "hyperparameters": {
    "n-hidden": "64",
    "n-epochs": "1",
    "lr":"1e-3"
    },
    "instanceType": "ml.c5.9xlarge"
  }
}

How to deploy the solution

Regions

The solution is using graph database Amazon Neptune for real-time fraud detection and Amazon DocumentDB for dashboard. Due to the availability of those services, the solution supports to be deployed to below regions,

  • US East (N. Virginia): us-east-1
  • US East (Ohio): us-east-2
  • US West (Oregon): us-west-2
  • Canada (Central): ca-central-1
  • South America (São Paulo): sa-east-1
  • Europe (Ireland): eu-west-1
  • Europe (London): eu-west-2
  • Europe (Paris): eu-west-3
  • Europe (Frankfurt): eu-central-1
  • Asia Pacific (Tokyo): ap-northeast-1
  • Asia Pacific (Seoul): ap-northeast-2
  • Asia Pacific (Singapore): ap-southeast-1
  • Asia Pacific (Sydney): ap-southeast-2
  • Asia Pacific (Mumbai): ap-south-1
  • China (Ningxia): cn-northwest-1

Prerequisites

  • An AWS account
  • Configure credential of aws cli
  • Install node.js LTS version, such as 12.x
  • Install Docker Engine
  • Install the dependencies of solution via executing command yarn install && npx projen
  • Initialize the CDK toolkit stack into AWS environment(only for deploying via AWS CDK first time), run yarn cdk-init
  • [Optional] Public hosted zone in Amazon Route 53
  • Authenticate with below ECR repository in your AWS partition
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-east-1.amazonaws.com

Run below command if you are deployed to China regions

aws ecr get-login-password --region cn-northwest-1 | docker login --username AWS --password-stdin 727897471807.dkr.ecr.cn-northwest-1.amazonaws.com.cn

Deploy it in a new VPC

The deployment will create a new VPC acrossing two AZs at least and NAT gateways. Then the solution will be deployed into the newly created VPC.

yarn deploy

Deploy it into existing VPC

If you want to deploy the solution to default VPC, use below command.

yarn deploy-to-default-vpc

Or deploy an existing VPC by specifying the VPC Id,

npx cdk deploy -c vpcId=<your vpc id>

NOTE: please make sure your existing VPC having both public subnets and private subnets with NAT gateway.

Deploy it with custom Neptune instance class and replica count

The solution will deploy Neptune cluster with instance class db.r5.8xlarge and 1 read replica by default. You can override the instance class and replica count like below,

npx cdk deploy --parameters NeptuneInstaneType=db.r5.12xlarge -c NeptuneReplicaCount=2 

Deploy it with custom domain of dashboard

If you want use custom domain to access the dashbaord of solution, you can use below options when deploying the solution. NOTE: you need already create a public hosted zone in Route 53, see Solution prerequisites for detail.

npx cdk deploy -c EnableDashboardCustomDomain=true --parameters DashboardDomain=<the custom domain> --parameters Route53HostedZoneId=<hosted zone id of your domain>

Deploy it to China regions

Add below additional context parameters,

npx cdk deploy -c TargetPartition=aws-cn

NOTE: deploying to China region also require below domain parameters, because the CloudFront distribution must be accessed via custom domain.

--parameters DashboardDomain=<the custom domain> --parameters Route53HostedZoneId=<hosted zone id of your domain>

How to test

yarn test

FAQ

TBA

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

About

An end-to-end solution for real-time fraud detection(leveraging graph database Neptune) using Amazon SageMaker and Deep Graph Library (DGL) to construct a heterogeneous graph from tabular data and train a Graph Neural Network(GNN) model to detect fraudulent transactions in the IEEE-CIS dataset.

License:Apache License 2.0


Languages

Language:TypeScript 42.2%Language:Jupyter Notebook 37.1%Language:Python 14.2%Language:JavaScript 3.4%Language:SCSS 1.5%Language:Shell 0.9%Language:HTML 0.4%Language:Dockerfile 0.4%