Efficiently decrypting vectors of GPS coordinates
koenniem opened this issue · comments
I am working with a fairly large dataset containing GPS coordinates encrypted with sodium in another programme. Now I need to decrypt them for analysis, but I am not sure how to do so efficiently. Please see the example below to see how I am currently decrypting data.
library(sodium)
# Create some fake GPS coordinates
data <- replicate(
n = 400000,
expr = paste0(
sample(0:50, size = 1), ".",
paste0(sample(0:9, size = 14, replace = TRUE), collapse = "")
)
)
# Generate keypair
key <- keygen()
pub <- pubkey(key)
# Encrypt message with pubkey
# Efficiency doesn't matter here
# For some reason, serialize doesn't work for my data
msg <- lapply(data, charToRaw)
ciphertext <- lapply(msg, function(x) simple_encrypt(x, pub))
ciphertext <- lapply(ciphertext, bin2hex)
# Now for uncrypting
# How to do it faster?
out <- lapply(ciphertext, hex2bin)
out <- lapply(out, simple_decrypt, key = key)
out <- lapply(out, rawToChar)
out <- unlist(out)
identical(out, data)
#> [1] TRUE
Created on 2022-12-02 with reprex v2.0.2
There are two components to this equation that slow down the process:
sodium::hex2bin()
only accepts one value.sodium::simple decrypt()
only accepts one value.
Running hex2bin()
on the encrypted data takes about 7 seconds on my machine, and decrypting takes 35 seconds. Please keep in mind that this is just an example; on real data, I would have to repeat this process many times. Normally, I would not know whether this is fast or slow (because I do not know much about encryption), but collapsing the ciphertext
to a single string (using paste()
) and then running hex2bin()
provides a significant speed boost.
In an ideal world, you'd run vectorized functions:
ciphertext |>
hex2bin() |>
simple_decrypt(key = key) |>
However, this is not possible with sodium
. Is there something wrong with my approach, or is this how one works with large vectors of data?