json schema generation - differences between pydantic and msgspec
yqiang opened this issue · comments
Question
I ran into an issue when taking the JSON Schema that's generated from a msgspec.Struct
and passing it to OpenAI's function_call APIs, which takes a json schema as an input to define how to produce the output. It doesn't like how msgspec produces the schema because there isn't a type
field at the root level.
Below are two versions of JSON schemas generated from the same model (i.e., same fields). The first one is from msgspec, while the second one is from pydantic v2, which works fine with the openai API. I'm not sure which is more correct, but wanted to raise the issue in case it is something that the author can/wants to address.
Thanks again for such a great library!
msgspec version 0.18.6, pydantic version 2.6.0.
msgspec generated json schema
{
'$ref': '#/$defs/MenuItemResponse2',
'$defs': {
'MenuItemResponse2': {
'title': 'MenuItemResponse2',
'type': 'object',
'properties': {'menu_items': {'type': 'array', 'items': {'$ref': '#/$defs/MenuItem2'}}},
'required': ['menu_items']
},
'MenuItem2': {
'title': 'MenuItem2',
'type': 'object',
'properties': {
'name': {'type': 'string'},
'calories': {'type': 'number'},
'protein': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None},
'fat': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None},
'carbohydrates': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None},
'dietary_fiber': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None},
'saturated_fat': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None},
'trans_fat': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None},
'cholesterol': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None},
'sodium': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None},
'serving_size': {'anyOf': [{'type': 'string'}, {'type': 'null'}], 'default': None}
},
'required': ['name', 'calories']
}
}
}
pydantic generated schema
{
'$defs': {
'MenuItem': {
'properties': {
'name': {'title': 'Name', 'type': 'string'},
'calories': {'title': 'Calories', 'type': 'number'},
'protein': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None, 'title': 'Protein'},
'fat': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None, 'title': 'Fat'},
'carbohydrates': {
'anyOf': [{'type': 'number'}, {'type': 'null'}],
'default': None,
'title': 'Carbohydrates'
},
'dietary_fiber': {
'anyOf': [{'type': 'number'}, {'type': 'null'}],
'default': None,
'title': 'Dietary Fiber'
},
'saturated_fat': {
'anyOf': [{'type': 'number'}, {'type': 'null'}],
'default': None,
'title': 'Saturated Fat'
},
'trans_fat': {
'anyOf': [{'type': 'number'}, {'type': 'null'}],
'default': None,
'title': 'Trans Fat'
},
'cholesterol': {
'anyOf': [{'type': 'number'}, {'type': 'null'}],
'default': None,
'title': 'Cholesterol'
},
'sodium': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None, 'title': 'Sodium'},
'serving_size': {
'anyOf': [{'type': 'string'}, {'type': 'null'}],
'default': None,
'title': 'Serving Size'
}
},
'required': ['name', 'calories'],
'title': 'MenuItem',
'type': 'object'
}
},
'properties': {'menu_items': {'items': {'$ref': '#/$defs/MenuItem'}, 'title': 'Menu Items', 'type': 'array'}},
'required': ['menu_items'],
'title': 'MenuItemResponse',
'type': 'object'
}
Relevant exception from the openai python SDK:
BadRequestError: Error code: 400 - {'error': {'message': 'Invalid schema for function \'menu_items\': schema must be a JSON Schema of \'type: "object"\', got \'type: "None"\'.', 'type': 'invalid_request_error', 'param': None, 'code': None}}
I don't think the type
is required in the root schema, it's definitely not required anywhere else.
Sure, using $ref
for root element seems like an unnecessary indirection, but as long it makes the code simpler (and Jim happier), I think it's fine.
I'd suggest reporting the problem with your service provider. You can also update the schema in your code, after all it's a representation of a dictionary.
Thanks for opening this @yqiang. Since object types are potentially cyclic, we always use "$ref"
to refer to them by reference. We could make a change to avoid doing this for acyclic object types (the common case), but since the JSON schema spec allows it I'd rather not.
I suspect that openai's code here has a naive check for an object type - what happens if you add a type
key in the top-level and do nothing else?
# existing msgspec json schema
schema["type"] = "object"
...
Does it properly handle the $ref
field?
Yup, the OpenAI API is happy if I manually add the type
key and object
as the value. Thanks for the response, I'll close this for now.