base64 serde module
dhduvall opened this issue · comments
Motivation
I have a JSON document with a normal string representation of a UUID in it (five pieces with dashes). I use serde
to deserialize into a struct whose type for that field is Uuid
. I hand this off to the MongoDB client crate and dump it into the database (which is actually Azure CosmosDB). I then use mongosh
to pull it out and write it to a file:
fs.writeFileSync('out.json', JSON.stringify(db.alerts.find({"_id": "<id string>"}, {}).toArray()[0]))
The resulting file has the UUID fields encoded as base64, rather than the normal UUID representation.
It would be nice to be able to feed that document back into my code and have it recognize those fields as valid UUIDs without having to go and manually change them.
Solution
I'm not sure, but I think that this could look like adding another serde
module like the existing compact
module, or adding another format type.
Alternatives
I'll probably end up with a custom deserialize_with
that first tries the standard uuid
deserialization, and if that fails, try base64. Shouldn't be a whole lot of code, but it might be useful to someone else in the future to have this baked in.
Is it blocking?
No.
Anything else?
Nothing.
Hmm that's interesting, does the MongoDB client serialize to a non-human-readable format, which translates serde
's bytes
type into a base64 encoded string? In human-readable formats we should serialize to a hyphenated string.
If you've already got data in your system in this format then coming up with a base64 decoder in your code might be the best way to go. It might be a bit niche to include in this library, but this issue will probably help anybody else who runs into the same problem.
I'm not sure where the UUID is being converted into base64; it may even be happening more than once (possibly on its way into the database, possibly on its way into the file). I suspect that it's happening, like you say, because some part of that pipeline has the UUID in its binary form, but without enough type information to recognize that has a special serialization, so when it needs to be stored in a JSON doc, it ends up with generic base64 serialization. I don't know enough about the way MongoDB or its client work to understand this in more than a hand-wavy way.
Whatever the mechanism, it ends up as base64 in the file, and if I want to read the file back in again, I have to do something special. For anyone who needs it, here's what I ended up with:
/// Try to deserialize a UUID first, and if that fails, try base64-decoding and then converting to
/// a UUID. This assumes that the input is deserializable as a String. This is useful when
/// pulling documents out of MongoDB, which will have serialized the UUID byte blob as base64.
fn deserialize_uuid_base64<'de, D>(deserializer: D) -> Result<Uuid, D::Error>
where
D: Deserializer<'de>,
{
use serde::de::Error;
// If we can't decode from a string, then something's wrong.
let s = String::deserialize(deserializer)?;
// If Uuid can make sense of that, great.
if let Ok(Uuid) = Uuid::from_str(s.as_ref()) {
return Ok(uu);
}
// Otherwise, try to base64-decode.
let uuid_bytes = base64::decode(s).map_err(Error::custom)?;
// Convert the resulting Vec<u8> to &[u8] and see if that's a UUID.
Uuid::from_slice(uuid_bytes.as_slice()).map_err(Error::custom)
}
Thanks for the note @dhduvall 👍 Hopefully that will help anybody else who finds themselves in this same pickle. I think this is a bit out-of-scope for uuid
to handle itself so I'll go ahead and close this one, but even as a closed issue this will be a useful resource for anybody searching how to deal with Uuid
s getting serialized as base64 strings.