Simple encryption leverages AES algorithm to encrypt and decrypt data. It uses secret key, iv with proper padding for detailed implementation.
- Generate security key with key folder and number of keys needed
java -jar simple_encryption.jar AESUtil dev 31
- Distribute the file to where application can access, such as HDFS or GitHub with Spring Cloud Config.
- Read all key files to keyCache with encoded
val keyCache = cacheKeyFromFolders("key")
- Encrypt the data (string) with rule-based version of keys
val cipherText = AesUtil.encryptWithVer(input, keyCache)
- Decrypt the data with the version of keys in cipher
val plainText = AesUtil.decryptWithVer(cipherText, keyCache)
- Cache the keyfiles and choose which key version to use with rules
val keyCache = cacheKeyFromFolders("key")
- Encrypt the dataframe (df) string column with rule-based version of keys
val encryptDf = dsEncrypt(df, "email,address", keyCache)
- Decrypt the dataframe (encryptDf) with the version of keys in cipher
val decryptDf = dsDecrypt(encryptDf, "email,address", keyCache)
- You can also use chained call in spark
val decryptDf = df
.transform(dsEncrypt("sin", keyCache))
.transform(dsDecrypt("sin", keyCache))
-
The key can rotate from the encryption side as follows
rotate rule comments always
rotate keys for every run day
rotate keys on every day month
rotate keys on every month, default year
rotate keys on every year -
Once all keys are cached, decryption works all the time.
-
If keys are destroyed, the cache keys should be removed carefully (make sure not being used on history data). This usually applies to your data has retention period.
-
If new keys are added, do not reuse the old version as follows. It creates additional 5 keys starting from version 31
java -jar simple_encryption.jar AESUtil dev 5 31
- The default key file format is version (3 byte), key (16 byte/128 bit), and iv (16 byte/128 bit).
- The cipher text format is key_version (3 byte), cipher text.
- Add support to for key generation dynamically from DES/KMS
- Add support encryption with hashing so that data can also be used as join conditions
- Add spark decryption function
- Add performance test cases