Real-time Fraud Detection with Graph Neural Network on DGL
It's a end-to-end solution for real-time fraud detection using graph database Amazon Neptune, Amazon SageMaker and Deep Graph Library (DGL) to construct a heterogeneous graph from tabular data and train a Graph Neural Network(GNN) model to detect fraudulent transactions in the IEEE-CIS Fraud detection dataset.
Architecutre of solution
This solution consists of below stacks,
- Fraud Detection solution stack
- nested model training and deployment stack
- nested real-time fraud detection stack
- nested transaction dashboard stack
Model training and deployment stack
The model training & deployment pipeline is orchestrated by AWS Step Functions like below graph,
Dashboard stack
It creates a React based web portal that observes the recent fraud transactions detected by this solution. This web application also is orchestrated by Amazon CloudFront, AWS Amplify, AWS AppSync, Amazon API Gateway, AWS Step Functions and Amazon DocumentDB.
How to train model and deploy inference endpoint
After deploying this solution, go to AWS Step Functions in AWS console, then start the state machine starting with ModelTrainingPipeline
.
You can input below parameters to overrride the default parameters of model training,
{
"trainingJob": {
"hyperparameters": {
"n-hidden": "64",
"n-epochs": "1",
"lr":"1e-3"
},
"instanceType": "ml.c5.9xlarge"
}
}
How to deploy the solution
Regions
The solution is using graph database Amazon Neptune for real-time fraud detection and Amazon DocumentDB for dashboard. Due to the availability of those services, the solution supports to be deployed to below regions,
- US East (N. Virginia): us-east-1
- US East (Ohio): us-east-2
- US West (Oregon): us-west-2
- Canada (Central): ca-central-1
- South America (São Paulo): sa-east-1
- Europe (Ireland): eu-west-1
- Europe (London): eu-west-2
- Europe (Paris): eu-west-3
- Europe (Frankfurt): eu-central-1
- Asia Pacific (Tokyo): ap-northeast-1
- Asia Pacific (Seoul): ap-northeast-2
- Asia Pacific (Singapore): ap-southeast-1
- Asia Pacific (Sydney): ap-southeast-2
- Asia Pacific (Mumbai): ap-south-1
- China (Ningxia): cn-northwest-1
Prerequisites
- An AWS account
- Configure credential of aws cli
- Install node.js LTS version, such as 12.x
- Install Docker Engine
- Install the dependencies of solution via executing command
yarn install && npx projen
- Initialize the CDK toolkit stack into AWS environment(only for deploying via AWS CDK first time), run
yarn cdk-init
- [Optional] Public hosted zone in Amazon Route 53
- Authenticate with below ECR repository in your AWS partition
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-east-1.amazonaws.com
Run below command if you are deployed to China regions
aws ecr get-login-password --region cn-northwest-1 | docker login --username AWS --password-stdin 727897471807.dkr.ecr.cn-northwest-1.amazonaws.com.cn
Deploy it in a new VPC
The deployment will create a new VPC acrossing two AZs at least and NAT gateways. Then the solution will be deployed into the newly created VPC.
yarn deploy
Deploy it into existing VPC
If you want to deploy the solution to default VPC, use below command.
yarn deploy-to-default-vpc
Or deploy an existing VPC by specifying the VPC Id,
npx cdk deploy -c vpcId=<your vpc id>
NOTE: please make sure your existing VPC having both public subnets and private subnets with NAT gateway.
Deploy it with custom Neptune instance class and replica count
The solution will deploy Neptune cluster with instance class db.r5.8xlarge
and 1
read replica by default. You can override the instance class and replica count like below,
npx cdk deploy --parameters NeptuneInstaneType=db.r5.12xlarge -c NeptuneReplicaCount=2
Deploy it with custom domain of dashboard
If you want use custom domain to access the dashbaord of solution, you can use below options when deploying the solution. NOTE: you need already create a public hosted zone in Route 53, see Solution prerequisites for detail.
npx cdk deploy -c EnableDashboardCustomDomain=true --parameters DashboardDomain=<the custom domain> --parameters Route53HostedZoneId=<hosted zone id of your domain>
Deploy it to China regions
Add below additional context parameters,
npx cdk deploy -c TargetPartition=aws-cn
NOTE: deploying to China region also require below domain parameters, because the CloudFront distribution must be accessed via custom domain.
--parameters DashboardDomain=<the custom domain> --parameters Route53HostedZoneId=<hosted zone id of your domain>
How to test
yarn test
FAQ
TBA
Security
See CONTRIBUTING for more information.
License
This project is licensed under the Apache-2.0 License.