microsoft / lsprotocol

Code generator and generated types for Language Server Protocol.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use sentinel values for omittable fields

kaifronsdal opened this issue · comments

Some optional parameters are included in the json string of a attrs object when unstructured.

For example when we run the following code

import lsprotocol
from lsprotocol import converters

obj = lsprotocol.types.InitializeParams(
        capabilities=lsprotocol.types.ClientCapabilities(),
        process_id=1234,
        root_path='/file/path',
        workspace_folders=[lsprotocol.types.WorkspaceFolder(
            name='name',
            uri='file:///file/path/name'
        )]
    )

converter = converters.get_converter()

print(converter.unstructure(obj))

it outputs

{
    'capabilities': {},
    'processId': 1234,
    'rootPath': '/file/path',
    'rootUri': None,
    'workspaceFolders': [{'uri': 'file:///file/path/name', 'name': 'name'}]
}

Notice how rootUri has become None even though we didn't include it. I would expect that rootUri should not be included in the json representation since it is an optional (and in fact deprecated) parameter.

It's possible this is expected behavior and if so, is there someway to work around it? The lsp server I am working with (coqlsp) throws errors when some of the optional parameters are included as null and it would be great to not have to manually filter out all cases where this happens.

This is a spec issue:

                {
                    "name": "rootUri",
                    "type": {
                        "kind": "or",
                        "items": [
                            {
                                "kind": "base",
                                "name": "DocumentUri"
                            },
                            {
                                "kind": "base",
                                "name": "null"
                            }
                        ]
                    },
                    "documentation": "The rootUri of the workspace. Is null if no\nfolder is open. If both `rootPath` and `rootUri` are set\n`rootUri` wins.\n\n@deprecated in favour of workspaceFolders.",
                    "deprecated": "in favour of workspaceFolders."
                },

The following part is why it shows up in the serialized form. This tell us to preserve null in the serialized form.

                            {
                                "kind": "base",
                                "name": "null"
                            }

@dbaeumer For deprecated fields should we remove the preservation in serialized form?

Would it be possible to create a sentinel value to specify a null value as distinct from python's None. In lsp, there is a strong distinction between an omitted attribute and a null one, but currently there is no easy way to make that distinction in python. One possible solution could be in types.py to include something like

class NullType:
    pass

# Create an instance of the sentinel value
null = NullType()

and then for attributes replace Union[..., None] with Union[..., null] and Optional[Union[..., None]] with Optional[Union[..., null]]. For example, in InitializeParams, change

root_path: Optional[Union[str, None]] = attrs.field(default=None)

to

root_path: Optional[Union[str, null]] = attrs.field(default=None)

There would have to be some minor tweaks to the converter to handle the new null value.

I think you have to preserve it since there could be servers still relying on it.

@kaifronsdal we already handle omitted value vs default None in this implementation. It is done in the structuring and unstructuring handlers. All fields that have such special requirement are explicitly listed, and we use unstructuring hooks to check and preserve fields marked as such.

In this case we are preserving it because the spec marks it as it needs to be preserved.

We had initially looked into sentinel values. One downside is that it leaks into server implementations. For example, people using pygls to wrap their tools would now have to know how the serializer handles this. For LSP extension implementations, all that is needed is a cattrs hook that handles that specific extension.

We are definitely open to suggestion and if there is enough value in going with sentinel approach we can look into using that.

Just for note, in C# this distinction is handled using NullValueHandling.Ignore strategy while de-serializing. Where as in Rust it is handled using explicit enum type that says null vs omitted. We have used approaches we believe match the patterns used in the language.

You might have already discussed this, but one suggestion I have is to instead have a sentinel value for omitted fields rather than null fields. Then each optional field can default to omitted and None is saved for null. Thus if a server omits a field, nothing changes for them, the cattrs would simply omit those fields as well in the json representation. But if the server sets it to None, then cattrs would yield a null value.

The only potential downside I see to this is that a server receiving an omitted value if implemented in certain ways, might have to handle this case explicitly compared to before. For instance, before a server might do something like

if field is None:

whereas now they would have to do

if field is None or field == omitted:

The positive is that now the server would be able to handle the cases where a client omitted a field and a client sent null without referencing the original json (as the distinction between omitted and null is removed once cattrs handles it). But as you said, some servers might need to make tweaks to their code base to support the change.

An alternative would be to simply set fields to None when unstructured from json if it is omitted or if its value is None, then servers would not have to make the above changes, but could still choose to omit fields from the json representation when structuring if they desire.