facebookincubator / cinder

Cinder is Meta's internal performance-oriented production version of CPython.

Home Page:https://trycinder.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`Union[str, int]` is treated as `dynamic`

LuKuangChen opened this issue · comments

commented

What version of Static Python are you using?

6862bbd
2021-10-03

What program did you run?

from typing import Union

x: Union[int, str] = 1
reveal_type(x)

What happened?

The type of x is dynamic

compiler.static.errors.TypedSyntaxError:
reveal_type(x): 'dynamic',
'x' has declared type 'dynamic' and
local type 'Literal[1]'

What should have happened?

We expected the type of x to be Union[int, str] because union types are supported:

x: Any = 1
if True:
    x = "hello"
else:
    x = 42
reveal_type(x)

# compiler.static.errors.TypedSyntaxError:
# reveal_type(x): 'dynamic',
# 'x' has declared type 'dynamic' and
# local type 'Union[str, int]'
commented

We also found that Static Python understands Union[str, None] and Union[None, int], but not Union[int, str].

This bug suggests that some tests pass for a wrong reason. For example, the following test from the static python test suit is asserting that a union is a subtype of a broader union. But what really happened is that "the broader union" is read as dynamic. Other target unions work too (Union[str, B]) even though they shouldn't!

    def test_union_can_assign_to_broader_union(self):
        self.assertReturns(
            """
            from typing import Union
            class B:
                pass
            def f(x: int, y: str) -> Union[int, str, B]:
                return x or y
            """,
            "Union[int, str]",
        )

Hi! So the widening of Union annotations to dynamic is necessary for soundness at the moment because of the limitations of our runtime type checks. We special-case support for runtime checking optional types as a degenerate case of Union, but due to implementation complexity and concerns about pathological performance with large Unions, we don't support runtime checks of arbitrary Unions. This means we can't insert a cast to an arbitrary Union type, which means we can't support it as an annotated argument or variable type, even though the compiler's type system can represent arbitrary unions.

It turns out there is still value in the compiler support, because if we infer a union from e.g. x = y or z (where y: str and z: int), this is a type that we can safely trust; it doesn't require any runtime casts to enforce. So we can later narrow based on e.g. if isinstance(z, str): ... else: ... in both the if and else branches, and this is sound.

Thanks for pointing out the bad tests! Originally we supported union annotations until we realized the soundness hole, so these tests predate the widening of union annotations to dynamic.

The path forward here is of course to add runtime support for casts to union, though we remain concerned about the performance with large union types. We might need to keep some arbitrary limit on union size, and above that limit widen the union to dynamic. Not sure what the best UX is for this; we could I suppose make it a type error to specify such a large union as an annotation? This might be better than silently widening to dynamic? Again this is a case where our desire for Pyre/mypy compatibility would push us away from the explicit approach and towards the silent widening.

The bad tests are addressed in 439b330

Unions should have static checks, right?

@vivaan2006 and I just ran a small program. We expected a typechecker error, but no:

def foo(i: float) -> None:
    return i

# python -m compiler --static main.py
# OK!

No error for int|str either.

@bennn This is explained in my comment above from Oct 4, 2021. It's not strictly a "static check" vs "runtime check" distinction. It's not enough to just say that we won't use union type information at runtime and when we emit a runtime check we will consider a union to be dynamic, because this can lead us to statically narrow a (not usable at runtime) union type to a simpler type that we do use at runtime, and then wrongly trust it. Consider:

def f(x: int|str) -> str:
    # we cannot verify the type of x is actually int|str at runtime when called from untyped code
    if not isinstance(x, int):
        # now our compiler believes x is type `str` and will trust this return to be safe
        return x
    return "foo"

Therefore we have to consider x to simply be dynamic in order to avoid unsoundness.

We could do better here if we had the ability in our compiler to track two parallel sets of types, "trusted" and "untrusted", and we used "untrusted" types for the purposes of emitting compile-time errors but never for runtime use. In that case we could record an "untrusted" type of int|str for x above, then narrow it to an "untrusted" type of str, then still emit a cast to str before the return. But we don't have that parallel-tracking facility in our compiler, and it's not a priority to add it, since effectively the "untrusted" types would just duplicate the static checks we already get from Pyre.

So the only time our compiler support for tracking unions actually comes into play is when a union type arises "organically" in a way where we can trust it, e.g. in a case like this:

def f(x: int, y: str) -> str:
    z = x or y  # z is trustable int|str
    if not isinstance(z, int):
        # now it is actually sound to consider z to be str
        return z
    return "foo"

So you'll see that you do get a static error from e.g. this variant of your example:

def f(x: int, y: str) -> None:
    z = x or y
    return z

Let me know if that doesn't make sense.

I won't claim this behavior is ideal when SP is viewed in isolation, but it makes a bit more sense when SP is considered in tandem with use of e.g. mypy or Pyre. In an ideal future system, I would want the parallel tracking of trusted and untrusted types in a single SP compiler (essentially rolling the functionality of pyre/mypy into the static compiler also.)

Also, more near term, we would ideally like to add support for arbitrary unions. The main question mark is the cost of verifying them at runtime, since that cost will be O(size of union) where our current casts are all O(1) cost (or technically, O(size of union * depth of inheritance tree) where today they are O(depth of inheritance tree)). But this may be acceptable, "avoid large unions in hot functions" would just become part of tuning guidance.

Ok, I was confused about the extent of compiler support for unions.

The examples help: there's support for those organic unions but everything else is dynamic.

The reveal_type(x) = dynamic from #42 is helpful too, in retrospect.