aardappel / lobster

The Lobster Programming Language

Home Page:http://strlen.com/lobster

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

What ideally should the semantics of `is` be?

aardappel opened this issue · comments

In the context of #97 I was thinking what ideally the semantics of is should be.

It serves two purposes, 1) dynamic checks (as a replacement for dynamic dispatch) and 2) compile time conditionals.

The typical use case for 1) is supervar is subtype, where supervar is a variable of super type that typically holds one of its subtypes.

Typical use case of 2) is subvar is subtype where subvar is specialized to a specific type.

For A is B, we currently get:

  1. Compile time true if type(A) == B. This includes the case where A is a subtype of B at runtime, even though that check would have failed at runtime!
  2. Compile time false if B cannot convert to type(A). This happens if types are unrelated or B is a supertype of A. So this does not include the case where A is a subtype of B at compile time (!)
  3. Otherwise a runtime equality check. This happens only for reference types since scalar types are covered by the compile time checks above.

This is clearly inconsistent with regards to sub typing. So the question is, should this include subtyping, yes or no?

Yes would be useful for compile time checks.

No would be better for runtime checks. We don't want is to have the high cost of having to check all supertypes as well.

I am tempted to say No, since predictable runtime performance is important. Rule 2) is already compliant with this. Rule 1) would have to change to only be compile-time true if type(A) == B and additionally type(A) does not have any subclasses.

So, if we assume that the semantic of A is B means that it must only be true if at runtime type(A) == B, and maybe only be compile time if at compile time we can prove that.

But now comes the interesting part that #97 is about: what if B is not a single exact type, but a set of types? Essentially, that would be a shortcut for A is B1 or A is B2. The PR introduces B<?> as the syntax for one such set of types.

Following the above, that would mean that A is B<?> is true only if the runtime type of A is any of the specializations of B (specifically excluding any super/sup types of A and B).

We have said above that we want is to always be a single equality check at runtime, which means in this case that A is B<?> should always have an answer at compile time, to be consistent with the is semantics. That means though that this cannot work when there's subtyping involved:

class A:
    ..
class B<T> : A
    ..
let a:A = B<int> { .. }
if a is B<?>:

This if condition could well be true at runtime. It is not guaranteed true or false at compile time. But to test it at runtime, it must be compared against all B instantiated types, which would be very undesirable!

How to solve this? We could simply error on any is that would need to check against multiple runtime types, since doing an is with a left-hand-side of a supertype is likely not common. But that does feel hacky.

For context, @arvyy wants to be able to write code like this, which I agree should be possible:

def foo(x):
    // This if should be compile-time:
    if x is A<?>: ..
    elif x is string: ..

foo(A<int>{})
foo("hi")

There's no type you can give x is to make this work, since foo<T>(x:A<T>) doesn't work with string.

An alternative might be to allow this:

def foo<T>(x:A<T>): ..
def foo(x:string): ..

That certainly doesn't work currently. It may be quite tricky to allow overloading combined with templates like this. C++ allows it, but that is definitely a more complex language :)

Related: C++ includes dynamic_cast which does include subtyping, but no-one likes to use it because it is too slow, and you typically don't want the subtyping, so many code bases provide their own alternative that does direct equality on typeid instead.

On the contrary, is in C# does allow subtyping, so can be expensive at runtime: https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/operators/type-testing-and-cast

For context, @arvyy wants to be able to write code like this, which I agree should be possible:

def foo(x):
    // This if should be compile-time:
    if x is A<?>: ..
    elif x is string: ..

foo(A<int>{})
foo("hi")

Specifically the:

// This if should be compile-time:

If the user wants some construct to be done during compile-time, then they should specify it on the construct itself, in this case on the if. The compiler will never be able to always accurately guess when a user wants something done compile-time or runtime.

That being said, if you do want to special case code like this you'd essentially be emulating union types where x: A<?> | string. In this union case though, the is is all dynamic since it can't know which variant the parameter is until then. If a union type isn't wanted then you'd run into issues where the user tries to do something before checking the type, e.g.:

def foo(x):
    some_other_method(x); // <-- what semantics do we expect here?
    println(x + 1); // <-- or here?
    if x is A<?>: ..
    elif x is string: ..

foo(A<int>{})
foo("hi")

You mentioned the alternative of allowing something along the lines of:

def foo<T>(x:A<T>): ..
def foo(x:string): ..

This would work and may be somewhat easier to reason about. This could also be done with traits/interfaces/typeclasses for example (in rust sorry, I don't know lobster well enough):

trait Foo<X>
    fn foo(x: X);

impl Foo<A<T>> { ... }
impl Foo<string> { ... }

So depending on how liberal you are on where interfaces can be implemented this may already be possible.

commented

personally I'm not sure, how would a is B<?> even work at runtime wrt to typing? Let's say B had field value:T. Then what would be the compiletime type of a.value after such check? Ie what if I try calling foo(a.value), where foo is a generic function? This works in eg java where there is a root class in type hierarchy and there is primitive boxing, but from what I understand that's not a case in lobster, and it'd take a lot of work to refactor

maybe one point to think about, is that currently if foo is bar bears both the metaprogramming and runtime reflection. Maybe the solution would be to have more separation here on syntax level? Say if foo is bar is always runtime check unless bar is primitive, and if compiling foo is bar is always a compiletime check

For A is B, we currently get:

  1. Compile time true if type(A) == B. This includes the case where A is a subtype of B at runtime, even though that check would have failed at runtime!

I think it should also be compile time true if A is known to be subtype of B at compile-time already

@jfecher I agree it is nice to have an explicit way to force compile time code, such that it can be an error if it fails. Maybe something to add to Lobster at some point.

Lobster has no union types like this, because it solves unions of types by compile-time specialization, so in foo, x is only ever A<int> or string, never possibly both. There are two copies of foo. So some_other_method etc will also be called (and possibly specialized) with those specific types.

Yes, Lobster doesn't have the equivalent of traits or type classes. Those features provide "nomimal" type compatibility guarantees, Lobster does the same with "compile time ducktyping", i.e. as long as there is a foo for a type, it will work. It generally already does, except currently it does type discrimination on the first arg (since, much like traits, the features can be used both statically and with dynamic dispatch), and this currently doesn't work with a templated type. Chances are, it can be made to work, though.

@arvyy yes, that is a good point. Already in your PR, you had to disable that in a is B, a gets "upgraded" to type B in the case of it being generic. In the runtime case, the same would have to happen making my example of if a is B<?>: extra useless, because a would stay of type A, thus would be limited to dynamic dispatches, in which case you have no use for this check.

Yes, we could make is into two separate features, though I am not sure that is great either. The runtime is would always be exact types only. and the compile time one could include subtyping, and would error out on my a is B<?> example since it can't be done conclusively at compile time. And we'd still have to explain why tou can write B<?> at one and not the other.. a bit messy.

I think it should also be compile time true if A is known to be subtype of B at compile-time already

Why? See my explanation that the semantics have to be consistent accross runtime and compile time, and that subtyping would make runtime slow. I think we don't want that.

I can go look into if the "overloading" on templated functions solution can work. I think that would be cleanest.

commented

Just to clarify my last point, I'm talking about

class A:
    ..
class B : A
    ..
let b:B = B { .. }
if b is A: //should be true

are we talking about same thing? If the is check here was false, then this would be breaking LSP

@arvyy Yes, above I am proposing to make is only about exact type equality, so b is A would be compile-time false, since there is no way b can at runtime end up having the exact type A.

is already (and always has had) that semantics at runtime. If you turn off the optimizer, b is A will be false. For that not to be the case, I'd have to check every superclass of b at runtime, something I'd prefer not to be doing, especially when all uses of is so far don't need it.

Also, we do have a way to such checks already that work with subtyping, using dynamic dispatch:

def foo(a:A): ..
foo(b {})

This will always manage to call foo, regardless of how deep the inheritance hierarchy is, or how many version of foo there are. But this is cheap, since the vtable for B will simply have the A version if it doesn't have its own.

Ok, have what appears to be a working implementation of the "overloading with generics" option I mentioned, which turned out a lot easier to add than I imagined: 96ce9f1

From the commit:

        // Static dispatch on generic types!
        class D<T>:
            x:T
        def foo<T>(d:D<T>): return d.x
        def foo(i:int): return i
        assert foo(D<int> { 1 }) + foo(2) == 3

Can anyone see problems with this approach?

commented

if we're stick with strict type equality check, the fix for

Compile time true if type(A) == B. This includes the case where A is a subtype of B at runtime, even though that check would have failed at runtime!

seems obvious -- check if B is final type, either a primitive or a class/struct without declared subtypes and only then assert compiletime true, else delegate to runtime

ok, needed one small fix to make the generic static dispatch working properly: 2a6c1ab

@arvyy yes, that's exactly what I say in my first post: "Rule 1) would have to change to only be compile-time true if type(A) == B and additionally type(A) does not have any subclasses."

Did you try if the static generic dispatch I just implemented works for your use case?

Does this work also for multiple arguments (something like "static multimethods")?

@dumblob no, the static dispatch currently decides which method to use based solely on the first argument, to have predictable equivalence to the dynamic case. In theory it would not be hard to expand this to the other arguments, the question is wether it is desirable.

In theory it would not be hard to expand this to the other arguments, the question is wether it is desirable.

Well, I myself think it's not (I saw some survey lately and most of "multimethod" cases were pretty low-level to save "boilerplate" which is though not needed in higher-level languages as Lobster). On the other hand I didn't yet saw any evidence multimethods would cause any harm. So if it's not much work, it might still make sense to have them (considering Lobster actually wants to offer performance in which case low-level stuff is a thing).

I asked about that because that was the last thing I could imagine could be missing in the current implementation. Otherwise this issue could be closed IMHO 😉.

Lobster used to have multi-methods (dynamically dispatched): 9eaa74b

They were removed exactly because of what you say: I never found any use for them, and I wanted to move the language in a "predictable performance" direction.

The issue shouldn't be closed since is should still be modified to make A is B compile-time true only if B has no subtypes, which is different from what is currently implemented. If it has subtypes then it needs to be a runtime check.

Decided to finally fix this ^ :)