pydantic / pydantic-extra-types

Extra Pydantic types.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature Request: MacAddress validation to support packed input types

mcdiarmid opened this issue · comments

Support Casting/Conversion of More Input Types

One of the awesome things about pydantic is the ability to convert input types that differ with the field type with a cast or conversion. For example, if my field is of type datetime and I pass in a float/int, it will be treated as a unix epoch and converted to a datetime type accordingly. Similarly, if a timestamp string such as 2023-09-22T12:30:01Z is passed to this field, it will also be converted to a datetime type accordingly.

Currently the MacAddress._validate class method only supports inputs of type str with length 14.

def _validate(cls, __input_value: str, _: Any) -> str:
return cls.validate_mac_address(__input_value.encode())

Mac Addresses are realistically often represented as a sequence of 6 bytes in packet headers. In my case I've written some code to unpack the header of a Layer 2 Ethernet Frame https://en.wikipedia.org/wiki/Ethernet_frame#Structure. However, I must first transform mac_destination and mac_source before constructing my model.

import struct
from enum import IntEnum
from typing import Final

from pydantic import BaseModel
from pydantic_extra_types.mac_address import MacAddress


class EtherType(IntEnum):
    IPV4 = 0x0800
    IPV6 = 0x86DD

class Layer2EthernetHeader(BaseModel):
    mac_dst: MacAddress
    mac_src: MacAddress
    ethertype: EtherType
    size_t: Final[int] = 14


def mac_str(mac_bytes: bytes) -> str:
    return ":".join(f"{b:02x}" for b in mac_bytes)


def decode_layer2_ethernet_header(data:  bytes, index: int = 0) -> Layer2EthernetHeader:
    mac_destination, mac_source, ethertype = struct.unpack(">6s6sH", test)
    return Layer2EthernetHeader(
        mac_dst=mac_str(mac_destination),
        mac_srt=mac_str(mac_source),
        mac_srt=ethertype,
    )

I propose that during validation, MacAddress performs a check for non-str input types, and handles them accordingly. Specifically iterables types of length 6 (bytes, bytearray, List[int], NDArray[int], ...). Below is a some code that could accomplish this (have also added a conversion from an int, but this might not be an appropriate representation of a Mac Address):

    def _validate(cls, __input_value: Union[str, Sequence[int]], _: Any) -> str: 
        if isinstance(__input_value, int):
            __input_value = [0xff & (__input_value >> (i*8)) for i in range(6)]

        if not isinstance(__input_value, str) and len(__input_value) == 6:
            __input_value = ":".join(f"{b:02x}" for b in __input_value)
        elif isinstance(__input_value, str):
            pass
        else:
            raise TypeError(
                f"Input must be str of length 14, or Sequence[int] of length 6.  Got: {__input_value}."
            )
        return cls.validate_mac_address(__input_value.encode()) 

Furthermore, for IP addresses pydantic uses the standard library's IP Address implementation, which stores the IP address as an int internally, but presents the human-readable format with the __str__ method. Would it make sense to store Mac Addresses in a Sequence[int] format behind the scenes, and implementing the human-readable colon separated bytes as __str__?

Edits: Typos.