time-is-ltd / pseudonymization-service

Anonymization service for Google Workspace and Microsoft 365 APIs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Build & Test

Pseudonymization Service for Google Workspace and Microsoft Graph APIs

Created and Open-sourced by Time is Ltd.

A backend service to anonymize Google Workspace and Microsoft Graph API response objects. Removing all sensitive and private textual and personal information from the objects returned by the APIs.

Quick start

Installation

Run locally

Prerequisites

  1. Clone repository
$ git clone https://github.com/time-is-ltd/pseudonymization-service.git
$ cd pseudonymization-service
  1. Install npm packages
$ npm i
  1. Create and edit .env file
$ cp .env.example .env
$ vi .env
  1. Optional: enable SSL

  2. Optional: Run tests

$ npm run test
  1. Run service
$ npm start

Run using docker

Prerequisites

Use the latest docker image from the GCP docker repository

  1. docker pull eu.gcr.io/proxy-272310/proxy:<version> (list of available versions)

  2. Create and edit file with enviromental variables

$ cp .env.example .env
$ vi .env
  1. Optional: enable SSL

  2. Run docker image (substitute <version> for your version)

$ docker run --env-file .env eu.gcr.io/proxy-272310/proxy:<version>

Run using docker-compose

  1. Clone repository
$ git clone https://github.com/time-is-ltd/pseudonymization-service.git
$ cd pseudonymization-service
  1. Create and edit file with enviromental variables
$ cp .env.example .env
$ vi .env
  1. Optional: enable SSL

  2. Build image

$ docker-compose build
  1. Run image
$ docker-compose up

SSL

  1. Get SSL certificate from certification authority or create self signed certificate
$ openssl req -nodes -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 3650

OpenSSL script will generate key.pem file with private key and cert.pem file with certificate.

  1. Convert private key file (key.pem) to one-line PEM format
$ awk 'NF {sub(/\r/, ""); printf "%s\\n",$0;}' key.pem
  1. Use printed value as SSL_KEY enviromental variable or SSL-KEY vault secret

  2. Convert certificate file (cert.pem) to one-line PEM format

$ awk 'NF {sub(/\r/, ""); printf "%s\\n",$0;}' cert.pem
  1. Use printed value as SSL_CERT enviromental variable or SSL-CERT vault secret

Configuration

There are 3 ways to provide config values

  • using enviromental variables
  • using Azure Key Vault (You have to provide AZURE_KEY_VAULT_NAME enviromental variable in order to enable Azure Key Vault)
  • using Google Secret Manager (You have to provide GCP_SECRET_MANAGER_PROJECT_ID enviromental variable in order to enable Google Secret Manager)
Enviromental Variable
name
Key Vault/Secret Manager
secret name
Value Description Example
API_TOKEN API-TOKEN string Authorization api token (must be at least 32 characters long) 76xmfSGx26wmj4ty8UuGGDMhrPkwNkjk
ANONYMIZATION_SALT ANONYMIZATION-SALT string Salt that is used in data anonymization. Should be at least 32 characters long yvUCixgSV6EMcE2FpZispWkju8N3LrWp
BASE_URL N/A string Pseudonymization service base url http://localhost
HTTP_PORT N/A number Optional. Http listening port. You should set at least one of HTTP_PORT or HTTPS_PORT env variables, otherwise the service will not listen on any port 80
HTTPS_PORT N/A number Optional. Https listening port. You have to provide SSL_KEY and SSL_CERT secrets 443
HTTP_PROXY N/A string HTTP proxy to be used by this app http://yourproxy.com:8080
HTTPS_PROXY N/A string HTTPS proxy to be used by this app. https://yourproxy.com:8080
VERBOSITY N/A number (default 0) Optional. Set verbosity level for stdout output (0, 1, 2). 0
INTERNAL_DOMAIN_LIST N/A comma separated list Optional. List of internal domains yourdomain.com,yourdomain.eu
ANONYMIZE_EXTERNAL_EMAIL_DOMAIN N/A boolean (default true) Optional. Anononymize external domain in emails true
ANONYMIZE_EXTERNAL_EMAIL_USERNAME N/A boolean (default true) Optional. Anononymize external username in emails true
ANONYMIZE_INTERNAL_EMAIL_DOMAIN N/A boolean (default false) Optional. Anononymize internal domain in emails false
ANONYMIZE_INTERNAL_EMAIL_USERNAME N/A boolean (default true) Optional. Anononymize internal username in emails true
EXTRACT_DOMAINS N/A boolean (default false) Optional. Allows extraction of domains from calendar events false
EXTRACT_DOMAINS_WHITELIST N/A comma separated list Optional. Whitelist to allow only specific domains to be extracted. If missing or empty, all domains are allowed. zoom.us,meet.google.com
AZURE_KEY_VAULT_NAME N/A string Optional. Set only if you want to use Azure Key Vault. Your Azure Key Vault name test-kv
GCP_SECRET_MANAGER_PROJECT_ID N/A string Optional. Set only if you want to use Google Secret Manager. GCP project id for which to manage secrets test-project
GCP_SECRET_MANAGER_PREFIX N/A string Optional. Set only if you want to use Google Secret Manager. This option allows you to prefix secret names with a string value. (e.g. if you set prefix to test, API-TOKEN will become TEST-API-TOKEN) test
SSL_KEY SSL-KEY string Optional. Converted file with private key (key.pem) to one-line PEM format. Follow SSL guide to get SSL PEM files.
SSL_CERT SSL-CERT string Optional. Converted file with certificate (cert.pem) to one-line PEM format. Follow SSL guide to get SSL PEM files.
RSA_PRIVATE_KEY RSA-PRIVATE-KEY string Optional. Converted file with RSA Private Key to one-line PEM format. Use the src/helpers/genKey.js to generate it. Necessary for full pseudonimization case only.
RSA_PUBLIC_KEY RSA-PUBLIC-KEY string Optional. Necessary for full pseudonimization case only.
GSUITE_CLIENT_EMAIL GSUITE-CLIENT-EMAIL string Optional. Value of client_email property located in google service account credentials.json file. You can get google service account credentials via How to get Google api credentials guide.
GSUITE_PRIVATE_KEY GSUITE-PRIVATE-KEY string Optional. Value of private_key property located in google service account credentials.json file. You can get google service account credentials via How to get Google api credentials guide.
GSUITE_SCOPES GSUITE-SCOPES string https://www.googleapis.com/auth/gmail.readonly, https://www.googleapis.com/auth/calendar.readonly
GSUITE_TEST_USER N/A string Optional. A GSuite account used to check upon proxy start that gsuite is configured correctly. someuser@yourdomain.com
O365_TENANT_ID O365-TENANT-ID string Optional. Office 365 tenant ID. You can get tenant ID via How to get Office 365 credentials guide 00000000-0000-0000-0000-000000000000
O365_CLIENT_ID O365-CLIENT-ID string Optional. Office 365 client ID. You can get client ID via How to get Office 365 credentials guide 00000000-0000-0000-0000-000000000000
O365_CLIENT_SECRET O365-CLIENT-SECRET string Optional. Office 365 client secret. You can get client secret via How to get Office 365 credentials guide
O365_TEST_USER N/A string Optional. An O365 account used to check upon proxy start that o365 is configured correctly. someuser@yourdomain.com

Verbosity levels

  • 0 (default): prints registered routes
  • 1: prints request headers in Apache Common Log format + list of loaded configuration keys
  • 2: prints whole request and response including bodies

All logging always goes to stdout only.

Test deployment with cURL

Get pseudonymized email messages response from the pseudonymized service with cURL

  • your_IP is the IP of the instance running the pseudonymized service
  • your_email@your_company.com your Google Workspace email address
  • your_api_key is your API key (to clarify, the API key is your generated key - string, at least 32 chars)

Health check

curl -X GET \
  https://your_IP/healthcheck \
  -H 'Cache-Control: no-cache' --insecure

Google Gmail API

curl -X GET \
  https://your_IP/www.googleapis.com/gmail/v1/users/your_email@your_company.com/messages \
  -H 'Authorization: Bearer your_api_key' \
  -H 'Cache-Control: no-cache' --insecure

Microsoft Graph API

curl -X GET \
  https://your_IP/graph.microsoft.com/v1.0/users/your_email@your_company.com/messages \
  -H 'Authorization: Bearer your_api_key' \
  -H 'Cache-Control: no-cache' --insecure

Future improvements

  1. Implement OAuth 2.0 Client Credentials Grant Type to receive Bearer jwt authorization token and use it instead of API_TOKEN

MIT License

Copyright (c) 2020 Time is Ltd.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

About

Anonymization service for Google Workspace and Microsoft 365 APIs

License:MIT License


Languages

Language:TypeScript 99.0%Language:JavaScript 0.7%Language:Dockerfile 0.4%