This project demonstrates end to end pipeline how to train binary anti money laundering (AML) classifier based on Generative Adversarial Networks (GANs) and Graph embeddings. Proposed solution includes following sub sections:
- Data ingestion - We will use sample of transactions data generated by AMLSim
- Feature store – We use Hopsworks Feature Store to compute features, organize them as feature groups and store for downstream analysis, such as creating training datasets for model training, as well as retrieving them
- Graph Embeddings - We will use StellarGraph library to compute graph embeddings.
- Anomaly detection model - We will use keras implementation of adversarial anomaly detection that was adapted to tabular data.
- Hyper parameter tuning - We will use Maggy to conduct experiments for hyperparameter tuning.
- Model serving - We will use Hopsworks model server to predict anomalous transactions.
A sample of transaction data is provided in the folder ./demodata, including upload alert_transactions.csv, party.csv and transactions.csv.
Keras implementation of adversarial anomaly detection is provided in the folder ./adversarialaml. To use this library install as python library from https://github.com/logicalclocks/AMLend2end.git.
To successfully complete this tutorial clone this repository to your Hopsworks project
Run jupyter notebooks in the following order:
- 1_transaction_feature_engineering_ingestion.ipynb
- 2_prep_training_dataset_for_embeddings.ipynb
- 3_maggy_node_embeddings.ipynb
- 4_compute_node_embeddings.ipynb
- 5_predict_and_create_node_embeddings_fg.ipynb
- 6_create_anomaly_detection_td.ipynb
- 7_maggy_adversarial_aml.ipynb
- 8_train_adversarial_aml.ipynb
- 9_aml_model_server.ipynb