Recommended Approach to Handle Serializing / Deserializing Bytes with Different Encodings?

Question

Recommended Approach to Handle Serializing / Deserializing Bytes with Different Encodings?

whatamithinking opened this issue 2 months ago · comments

Open API / JSON Schema have different encodings (base64, base32, base32hex, etc.)
apischema does not seem to support these out of the box and instead base64 encodes all bytes.
What is the recommended approach to supporting these?

I have a workaround (see below), but it might help others if other encodings were built in, maybe in a way that avoids creating custom types as I have below.

import base64 
import apischema
from apischema import ValidationError
from apischema.validation.errors import merge_errors
from apischema.deserialization.methods import (
    ConversionUnionMethod,
    StrMethod,
    BoolMethod,
    IntMethod,
    FloatMethod,
    NoneMethod,
    ConstrainedStrMethod,
    ConstrainedIntMethod,
    ConstrainedFloatMethod,
)


class Base32HexBytes(bytes): ...


class Base64Bytes(bytes): ...


class Base32Bytes(bytes): ...


class Base16Bytes(bytes): ...


class Base32HexStr(str): ...


class Base64Str(str): ...


class Base32Str(str): ...


class Base16Str(str): ...


def _serialize_base_32_hex_bytes(data: Base32HexBytes) -> str:
    return base64.b32hexencode(data).decode()


def _serialize_base_64_bytes(data: Base64Bytes) -> str:
    return base64.b64encode(data).decode()


def _serialize_base_32_bytes(data: Base32Bytes) -> str:
    return base64.b32encode(data).decode()


def _serialize_base_16_bytes(data: Base16Bytes) -> str:
    return base64.b32encode(data).decode()


def _deserialize_base_32_hex_bytes(data: Base32HexStr) -> str:
    return base64.b32hexdecode(data).decode()


def _deserialize_base_64_bytes(data: Base64Str) -> str:
    return base64.b64decode(data).decode()


def _deserialize_base_32_bytes(data: Base32Str) -> str:
    return base64.b32decode(data).decode()


def _deserialize_base_16_bytes(data: Base16Str) -> str:
    return base64.b16decode(data).decode()


apischema.serializer(_serialize_base_32_hex_bytes, source=Base32HexBytes)
apischema.serializer(_serialize_base_64_bytes, source=Base64Bytes)
apischema.serializer(_serialize_base_32_bytes, source=Base32Bytes)
apischema.serializer(_serialize_base_16_bytes, source=Base16Bytes)

apischema.deserializer(
    apischema.conversions.Conversion(base64.b32hexdecode, str, Base32HexBytes)
)
apischema.deserializer(
    apischema.conversions.Conversion(_deserialize_base_32_hex_bytes, Base32HexStr, str)
)
apischema.deserializer(
    apischema.conversions.Conversion(base64.b64decode, str, Base64Bytes)
)
apischema.deserializer(
    apischema.conversions.Conversion(_deserialize_base_64_bytes, Base64Str, str)
)
apischema.deserializer(
    apischema.conversions.Conversion(base64.b32decode, str, Base32Bytes)
)
apischema.deserializer(
    apischema.conversions.Conversion(_deserialize_base_32_bytes, Base32Str, str)
)
apischema.deserializer(
    apischema.conversions.Conversion(base64.b16decode, str, Base16Bytes)
)
apischema.deserializer(
    apischema.conversions.Conversion(_deserialize_base_16_bytes, Base16Str, str)
)

Joseph Perez · Answer 1 · Tue May 21 2024 22:19:47 GMT+0800 (China Standard Time)

Bytes have default (de)serializer registered is on purpose to make it convenient for user as it base64 is AFAIK the "standard" way of doing.

If your API use an other format, you can register a different (de)serializer, and override the default one. That's the "recommended approach": leveraging apischema adaptability.

Or you can also use different types as you're doing in your code if you're mixing encoding. But the purpose of apischema is to not embed these custom types in its own code; I don't want a pydantic-like mess with dozens of predefined conversions provided when 3 lines of code are enough, thanks to conversion feature. apischema should only supports standard types, with an appropriate default (de)serializer.

The fact you opened the issue may highlight the lack of a dedicated example in the documentation, or maybe a FAQ. I will keep this issue opened as a reminder.

Connor Maynes · Answer 2 · Thu May 30 2024 02:12:20 GMT+0800 (China Standard Time)

I think I understand where you are coming from, but it seems like a library with so much focus on openapi should support the different encodings that standard calls out, which are finite and supported by the base64 package in the python standard library. Adding support for these encodings internally, one way or another, would allow things to seemingly "just work" from the perspective of newcomers and could make apischema more of the goto for anyone solely focused on working with openapi, as I am.