jcrist / msgspec

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML

Home Page:https://jcristharif.com/msgspec/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

json schema generation - differences between pydantic and msgspec

yqiang opened this issue · comments

commented

Question

I ran into an issue when taking the JSON Schema that's generated from a msgspec.Struct and passing it to OpenAI's function_call APIs, which takes a json schema as an input to define how to produce the output. It doesn't like how msgspec produces the schema because there isn't a type field at the root level.

Below are two versions of JSON schemas generated from the same model (i.e., same fields). The first one is from msgspec, while the second one is from pydantic v2, which works fine with the openai API. I'm not sure which is more correct, but wanted to raise the issue in case it is something that the author can/wants to address.

Thanks again for such a great library!

msgspec version 0.18.6, pydantic version 2.6.0.

msgspec generated json schema

{
    '$ref': '#/$defs/MenuItemResponse2',
    '$defs': {
        'MenuItemResponse2': {
            'title': 'MenuItemResponse2',
            'type': 'object',
            'properties': {'menu_items': {'type': 'array', 'items': {'$ref': '#/$defs/MenuItem2'}}},
            'required': ['menu_items']
        },
        'MenuItem2': {
            'title': 'MenuItem2',
            'type': 'object',
            'properties': {
                'name': {'type': 'string'},
                'calories': {'type': 'number'},
                'protein': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None},
                'fat': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None},
                'carbohydrates': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None},
                'dietary_fiber': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None},
                'saturated_fat': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None},
                'trans_fat': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None},
                'cholesterol': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None},
                'sodium': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None},
                'serving_size': {'anyOf': [{'type': 'string'}, {'type': 'null'}], 'default': None}
            },
            'required': ['name', 'calories']
        }
    }
}

pydantic generated schema

{
    '$defs': {
        'MenuItem': {
            'properties': {
                'name': {'title': 'Name', 'type': 'string'},
                'calories': {'title': 'Calories', 'type': 'number'},
                'protein': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None, 'title': 'Protein'},
                'fat': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None, 'title': 'Fat'},
                'carbohydrates': {
                    'anyOf': [{'type': 'number'}, {'type': 'null'}],
                    'default': None,
                    'title': 'Carbohydrates'
                },
                'dietary_fiber': {
                    'anyOf': [{'type': 'number'}, {'type': 'null'}],
                    'default': None,
                    'title': 'Dietary Fiber'
                },
                'saturated_fat': {
                    'anyOf': [{'type': 'number'}, {'type': 'null'}],
                    'default': None,
                    'title': 'Saturated Fat'
                },
                'trans_fat': {
                    'anyOf': [{'type': 'number'}, {'type': 'null'}],
                    'default': None,
                    'title': 'Trans Fat'
                },
                'cholesterol': {
                    'anyOf': [{'type': 'number'}, {'type': 'null'}],
                    'default': None,
                    'title': 'Cholesterol'
                },
                'sodium': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None, 'title': 'Sodium'},
                'serving_size': {
                    'anyOf': [{'type': 'string'}, {'type': 'null'}],
                    'default': None,
                    'title': 'Serving Size'
                }
            },
            'required': ['name', 'calories'],
            'title': 'MenuItem',
            'type': 'object'
        }
    },
    'properties': {'menu_items': {'items': {'$ref': '#/$defs/MenuItem'}, 'title': 'Menu Items', 'type': 'array'}},
    'required': ['menu_items'],
    'title': 'MenuItemResponse',
    'type': 'object'
}

Relevant exception from the openai python SDK:

BadRequestError: Error code: 400 - {'error': {'message': 'Invalid schema for function \'menu_items\': schema must be a JSON Schema of \'type: "object"\', got \'type: "None"\'.', 'type': 'invalid_request_error', 'param': None, 'code': None}}

I don't think the type is required in the root schema, it's definitely not required anywhere else.

Sure, using $ref for root element seems like an unnecessary indirection, but as long it makes the code simpler (and Jim happier), I think it's fine.

I'd suggest reporting the problem with your service provider. You can also update the schema in your code, after all it's a representation of a dictionary.

Thanks for opening this @yqiang. Since object types are potentially cyclic, we always use "$ref" to refer to them by reference. We could make a change to avoid doing this for acyclic object types (the common case), but since the JSON schema spec allows it I'd rather not.

I suspect that openai's code here has a naive check for an object type - what happens if you add a type key in the top-level and do nothing else?

# existing msgspec json schema
schema["type"] = "object"
...

Does it properly handle the $ref field?

commented

Yup, the OpenAI API is happy if I manually add the type key and object as the value. Thanks for the response, I'll close this for now.