rust-lang / rfcs

RFCs for changes to Rust

Home Page:https://rust-lang.github.io/rfcs/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Auto-generated sum types

Ekleog opened this issue · comments

First, the idea was lain down by @glaebhoerl here.

The idea is basically to have a way to tell the compiler “please auto-generate me a sum trait for my -> impl Trait” function, that would automatically derive Trait based on the implementations of the individual members.

There would, I think, be a need for a syntax specially dedicated to this, so that people are not auto-generating these without being aware of it. Currently, the two syntaxes I can think of are either |value| (without a following {}), running on the idea of “this is auto-generated, just like closures” (I don't like it much, but it could make sense), and the other idea is to make it use a keyword. I'd have gone with auto, but it appears to not be reserved, and become or enum would likely be the best choices.

(Edit: @Pauan pointed out that the |…| syntax would be ambiguous in certain cases, so please disregard it)

So the syntax would, I think, look like this:

fn foo(x: bool) -> impl Iterator<Item = u8> {
    if x { return become b"foo".iter().cloned() }
    become b"hello".iter().map(|c| c + 1)
}
// Or:
fn foo(x: bool) -> impl Iterator<Item = u8> {
    if x { return enum b"foo".iter().cloned() }
    enum b"hello".iter().map(|c| c + 1)
}
// Or:
fn foo(x: bool) -> impl Iterator<Item = u8> {
    if x { return |b"foo".iter().cloned()| }
    |b"hello".iter().map(|c| c + 1)|
}

The major advantage of the || syntax is that it doesn't raise the issue of parenthesizing, as it's already made of parenthesis.

What do you think about this idea, especially now that impl Trait is landing and the need is getting stronger?

Thanks for creating a separate issue for this.

To be honest, I'm not sure we need explicit syntax for this. It's more of an (important) implementation detail than a user-facing feature, no? But if one does want to make the syntax explicit, then I suggest putting something like impl enum Trait in the function signature.

To be honest, I'm not sure we need explicit syntax for this. It's more of an (important) implementation detail than a user-facing feature, no? But if one does want to make the syntax explicit, then I suggest putting something like impl enum Trait in the function signature.

@alexreg one reason to make it explicit is it does have performance implications, each time you call a method there will have to be a branch to call into the current type (hmm, unless these were implemented as some form of stack-box with a vtable instead, either way that still changes the performance). My first thought was also to make it a modifier on the impl Trait syntax to avoid having to repeat it on each return (impl sum Trait for the insiders pun of the some Trait proposed keyword instead of impl). But, as you mention this is an implementation detail, so that should not be exposed in the signature (could just have rustdoc hide it I suppose).

commented

I might be wrong, but wouldn't the |...| cause parsing ambiguities, since it's already used for closures?

@Pauan Oh indeed, I was thinking EXPR { } was not valid syntax, but that's not the case in eg. if. Then, in if the |...| syntax should not be allowed anyway in if conditions, but that'd complicate for no reason the parser.

@Nemo157, @alexreg The issue with putting this as a modifier in the type signature is the fact it wouldn't work well inside a function:

fn bar() -> Option<LinkedList<char>> { /* ... */ }
// This is allowed
fn foo() -> impl enum Iterator<Item = char> {
    match bar() {
        Some(x) => x.iter(),
        None => "".iter(),
    }
}
// Either this is not allowed, or the loss in performance is not explicit
fn foo() -> impl enum Iterator<Item = char> {
    let mut tmp = match bar() {
        Some(x) => x.iter(),
        None => "".iter(),
    };
    let n = tmp.next();
    match n {
        Some(_) => tmp,
        None => "foo bar".iter(),
    }
}
commented

Haven't you just invented dynamic dispatch??

Yeah, fair point about the performance hit. It’s a small one, but it wouldn’t be in the spirit of Rust to hide it from the user syntactically.

@Ekleog you can use the exact same syntax for inside a function, somewhere you need to mention the trait that you're generating a sum type for anyway:

fn foo() -> impl enum Iterator<Item = char> {
    let mut tmp: impl enum Iterator<Item = char> = match bar() {
        Some(x) => x.iter(),
        None => "".iter(),
    };
    let n = tmp.next();
    match n {
        Some(_) => tmp,
        None => "foo bar".iter(),
    }
}

@est31 a constrained form of dynamic dispatch that could potentially get statically optimized if the compiler can prove only one or the other case is hit. Or, as I briefly mentioned above it could be possible for this to be done via a union for storage + a vtable for implementation, giving the benefits of dynamic dispatch without having to use the heap. (Although, if you have wildly different sizes for the different variants then you pay the cost in always using the size of the largest.)

One thing that I think might be important is to benchmark this versus just boxing and potentially have a lint recommending switching to a box if you have a large number of variants (I'm almost certain that a 200 variant switch would be a lot slower than dynamically dispatching to one of 200 implementors of a trait, but I couldn't begin to guess at what point the two crossover in call overhead, and there's the overhead of allocating the box in the first place).

@Nemo157 Thanks for explaining things better than I could!

I'd just have a small remark about your statement: I don't think a 200-variant switch would be a lot slower than dynamic dispatch: the switch should be codegen'd as a jump table, which would give something like (last time I wrote assembler is getting a bit long ago so I'm not sure about the exact syntax) mov rax, 0xBASE_JUMP_TABLE(ENUM_DISCRIMINANT,3); jmp rax, while the dynamic dispatch would look like mov rax, METHOD_INDEX(VTABLE); jmp rax. As BASE_JUMP_TABLE and METHOD_INDEX are constants, they're hardcoded in the assembly, and so both cases end up being 1/ a memory load (either in JUMP_TABLE or VTABLE, so there could be cache impact here depending on whether you always call different methods on the same object or whether you always call the same method on different objects), and 2/ a jump (with a dependency between these instructions).

So the mere number of implementors shouldn't matter much in evaluating the performance of this dispatch vs. a box. The way of using them does have an impact, but this will likely be hard to evaluate from the compiler's perspective.

However, what may raise an issue about performance is nesting of such sum types: if you have a sum type of a sum type of etc., then you're going to lose quite a bit of time going through all these jump tables. But the compiler may detect that one member of the sum type is another sum type and just flatten the result, so I guess that's more a matter of implementation than specifiction? :)

commented

a constrained form of dynamic dispatch that could potentially get statically optimized if the compiler can prove only one or the other case is hit.

LLVM is capable of doing devirtualisation.

as I briefly mentioned above it could be possible for this to be done via a union for storage + a vtable for implementation, giving the benefits of dynamic dispatch without having to use the heap

That's a point admittedly. Dynamically sized stack objects are a possibility but they have certain performance disadvantages.

Is there anything wrong with the bike shed color -> enum Trait? There are still folk who want impl Trait to be replaced by some Trait and any Trait so maybe just enum Trait catches the "make this for me" better.

I think procedural macros could generate this right now. If coersions develop further then maybe doing so would becomes quite simple even. Right now, there are Either types that do this for specific types like Iterator and Future. I'm unsure why they do not even implement From.

So if we are going to start painting the bike shed (there seems to be little opposition right now, even though it has only been like a day since initial posting), I think there is a first question to answer:

Should the indication of the fact the return is an anonymous sum type lie in the return type or at the return sites?

i.e.

fn foo(x: T) -> MARKER Trait {
    match x {
        Bar => bar(),
        Baz => baz(),
        Quux => quux(),
    }
}
// vs.
fn foo(x: T) -> impl Trait {
    match x {
        Bar => MARKER bar(),
        Baz => MARKER baz(),
        Quux => MARKER quux(),
    }
}

Once this question will have been answered, we'll be able to think about the specifics of what MARKER should be.

So, now, my opinion: I think the fact the return is an anonymous sum type should lie at the return site, for two reasons:

  • mostly because the caller has no need of knowing that the return type is an anonymous sum type, only that it's some type that implements Trait
  • but also because it will have a nicer syntax when used inside a function with something that is not actually a return, but eg. let x = match or similar (depending on how effective type inference of the result will be, I think taking the intersection of the traits matched by all the return branches should do it, but…)

On the other hand, the only argument I could think of in favor of putting the marker in the return type is that it makes for less boilerplate, but I'm not really convinced, so I'm maybe not pushing forward the best arguments.

@Ekleog Yeah, I'm with you on return site actually, even though I proposed the return type syntax above. As you say, it reflects the fact that it's more of an implementation detail that consumers of the function don't need to (shouldn't) care about. Also, I think the analogy to box syntax is a good one, and it would be used in a similar way superficially.

I think procedural macros could generate this right now.

For a closed set of traits, sure, but to allow this to be used for any trait requires compiler support for getting the methods of the traits. Delegation plus some of its extensions might enable this to be fully implemented as a procedural macro.

I'm tempted to try and write a library or procedural macro version of this, I am currently manually doing this for io::{Read, Write} so even if it only supports those traits it would be useful for me. If it works that could be a good way to get some experience with using it in real code to inform the RFC.

commented

Various forms of enum impl Trait sugar for autogenerated anonymous enums have been discussed a few times in the past. For some reason I can't find a centralized discussion of them now, but I think the points that came up before but haven't been fully raised yet are:

  • The original motivation in past discussions was to make it possible to introduce a new error type to a function without it "virally infecting"/requiring code changes up the entire call stack. I still believe that's a far more compelling benefit than anything to do with performance (after all, this is just sugar over defining your own enum).
  • It's probably better to put the marker for this feature on the signature, i.e. fn foo(x: T) -> MARKER Trait
    • As far as I know, the only options here are one marker in the signature, and one marker at every return point. A marker for every return point is just too noisy (especially when the return point is just a ?), and seems to go against the motivation of making it trivial to introduce a new error type.
    • The main argument against having it in the signature is that it shouldn't and doesn't affect the function's API/contract, so it's not something clients should know about. While a good rule of thumb, it's been pointed out in the past that this is already not an ironclad rule (I believe mut on arguments was the counterexample that's stable today), so rustdoc having to hide the marker isn't a new layer of complexity.
  • The syntax probably should include impl Trait because "returning an impl Trait" is the function's API/contract as far as clients are concerned. Hence, I slightly prefer enum impl Trait over enum Trait or impl enum Trait.

I believe the main thing preventing these discussions from going anywhere is that making it even easier to use impl Trait further accentuates the only serious problem with impl Trait: that it encourages making types unnameable. But now that the abstract type proposal exists, I don't personally think that's a big concern anymore.


@Nemo157 Hmm, what delegation extensions do you think we'd need? The "desugaring" I've always imagined is just a delegate *, so we'd need enum delegation (technically an extension iirc), and then whatever it takes for a delegate * to "just work" on most traits, which might mean delegation of associated types/constants, and I think that's it.

Though I think we should probably not block this on delegation, since it could just be compiler magic.

From the current delegation RFC the extensions required are "delegating for an enum where every variant's data type implements the same trait" (or "Getter Methods" + "Delegate block") and "Delegating 'multiple Self arguments' for traits like PartialOrd" (although, this could be implemented without it and this feature would be in the same state as normal delegation until it's supported).

One thing I just realised is that delegation won't help with unbound associated types, required to support use cases like:

#[enumified]
fn foo() -> impl IntoIterator<Item = u32> {
    if true {
        vec![1, 2]
    } else {
        static values: &[u32] = &[3, 4];
        values.iter().cloned()
    }
}

would need to generate something like

enum Enumified_foo_IntoIterator {
    A(Vec<u32>),
    B(iter::Cloned<slice::Iter<'static>>),
}

enum Enumified_foo_IntoIterator_IntoIter_Iterator {
    A(vec::Iter<u32>),
    B(iter::Cloned<slice::Iter<'static>>),
}

impl IntoIterator for Enumified_foo {
    type Item = u32;
    type IntoIter = Enumified_foo_IntoIterator_IntoIter_Iterator;

    fn into_iter(self) -> self::IntoIter {
        match self {
            Enumified_foo_IntoIterator::A(a)
                => Enumified_foo_IntoIterator_IntoIter_Iterator::A(a.into_iter()),
            Enumified_foo_IntoIterator::B(b)
                => Enumified_foo_IntoIterator_IntoIter_Iterator::B(b.into_iter()),
        }
    }
}

impl Iterator for Enumified_foo_IntoIterator_IntoIter_Iterator {
    ...
}

@Ixrec Oh indeed, I didn't think of the Result<Foo, impl ErrorTrait> use case (my use case being really futures and wanting not to allocate all the time), that would make a syntax with the marker at return site much less convenient, with ?.

However, this could be “fixed” by piping ?'s definition through sed 's/return/return MARKER/'.

This wouldn't change the behaviour for existing code (as adding MARKER when there is a single variant would be a no-op). However, an argument could be raised that it does the reducing of performance without explicit marker, but for ? the boat has sailed and it's already using From magically.

So I still think that the ease of using MARKER through type-inferred branches (like let x = match { ... }), and not only at typed boundaries (like function-level return or let x: MARKER Trait = match { ... }), is a net enough win in favor of MARKER at the return site, with an implicit MARKER for ?.

However, as a downside of MARKER at return site, I must add that having it at return site will likely make implementation harder: what is the type of { MARKER x }? I'd say it could be handled by having a “TypeInProgress” type that would slowly be intersected with all the other types, and MARKER basically lifts Type to TypeInProgress(Type), and then typing rules follow, eg.

fn bar() -> bool {}
struct Baz {} fn baz() -> Baz {}
struct Quux {} fn quux() -> Quux {}
struct More {} fn more() -> More {}

trait Trait1 {} impl Trait1 for Baz {} impl Trait1 for Quux {} impl Trait1 for More
trait Trait2 {} impl Trait2 for Baz {} impl Trait2 for Quux {}

fn foo() -> impl Trait1 {
    let x = match bar() {
        true => MARKER baz(), //: TypeInProgress(Baz)
        false => MARKER quux(), //: TypeInProgress(Quux)
    }; //: TypeInProgress(Baz, Baz) & TypeInProgress(Quux, Quux)
    // = TypeInProgress(Trait1 + Trait2, enum { Baz, Quux })
    // (all the traits implemented by both)
    if bar() {
        MARKER x //: TypeInProgress(Trait1 + Trait2, enum { Baz, Quux })
    } else {
        MARKER more() //: TypeInProgress(More, More)
    } // TypeInProgress(Trait1 + Trait2, enum { Baz, Quux }) & TypeInProgress(More, More)
    // = TypeInProgress(Trait1, enum { Baz, Quux, More })
    // (all the types implemented by both)
}

And, once this forward-running phase has been performed, the actual type can be observed (ie. here enum { Baz, Quux, More }), generated, and backwards-filled into all the TypeInProgress placeholders.
Obviously, this requires that at the time of MARKER, the type is already known.

On the other hand, for a return-site-level MARKER setup, the scheme I can think of is the following:

// (skipping the same boilerplate)

fn foo() -> MARKER Trait1 {
    let x: MARKER Trait1 = match bar() {
        true => baz(),
        false => quux(),
    }; // Here we need to infer MARKER Trait1.
    // Observing all the values that come in, we see it must be enum { Baz, Quux }
    if bar() {
        x
    } else {
        more()
    } // Here we need to infer MARKER Trait1.
    // Observing all the values that come in, it must be enum { enum { Baz, Quux }, More }
}

I personally find the syntax of the second example less convenient (it forces writing down exactly which trait(s) we want to have, not letting type inference do its job) and the end-result less clean (two nested enums will be harder to optimize). It also will likely not detect common subtrees, eg. if the more() of the else branch was replaced by if bar() { more() } else { baz() }, the first type inference algorithm would infer enum { Baz, Quux, More }, because TypeInProgress(Trait1, enum { Baz, Quux, More }) is the intersection of TypeInProgress(Trait1 + Trait2, enum { Baz, Quux }) and TypeInProgress(Trait1, enum { More, Baz }) ; while the second type inference algorithm would be forced to complete the typing at the let x: MARKER Trait1, and would thus have to infer enum { enum { Baz, Quux }, enum { More, Baz } } (or maybe enum { enum { Baz, Quux }, More, Baz }, and in either case hitting exactly the performance issue of nesting sum types discussed before).

What do you think about the idea of having ? return MARKER x.unwrap_err()? (I agree with you that without it it's most likely best to have MARKER at the return type)

commented

I personally find the syntax of the second example less convenient (it forces writing down exactly which trait(s) we want to have, not letting type inference do its job)

For me, the primary use case is returning an enum impl Trait from a function, in which case you already need to write down the traits you want in the signature. So I never saw this as a disadvantage. In fact, it didn't even occur to me to apply enum impl Trait on a regular let binding until today.

I'm not sure I understand the sentiment behind "not letting type inference do its job". Both variations of this idea involve us writing an explicit MARKER to say we want an autogenerated anonymous enum type. In both cases, type inference is only gathering up all the variants for us, and never inferring the need for an anon enum type in the first place. In both cases, the variable x needs to have a type of some kind, at least for type-checking purposes, and in both cases that type may very well disappear during compilation/optimization. So, what's the job that type inference doesn't do in the MARKER-on-signature case?

and the end-result less clean (two nested enums will be harder to optimize). It also will likely not detect common subtrees

I'm not sure I buy either of these claims. To the extent that "detecting common subtrees" is important, I would expect the existing enum layout optimizations to effectively take care of that for free. We probably need an actual compiler dev to comment here, but my expectation would be that the actual optimization-inhibiting difficulties would come from having "all traits" implemented by the anon enums, instead of just the traits you need.

And to me, the autogenerated anonymous enum type implementing more traits than I need/want it to is "less clean". I guess that's one of those loaded terms that's not super helpful.

I'm not seeing the significance of the TypeInProgress stuff; that seems like machinery you could use in implementing either variation of the syntax, and I don't see what it buys you other than perhaps guaranteeing the type of x goes away. This is probably another thing we need compiler people to comment on, but trying to make some variables not have a type sounds to me like it would be a non-starter, its motivation better addressed my optimizations after type-checking, and entirely orthogonal to the question of what the surface syntax should be anyway.

What do you think about the idea of having ? return MARKER x.unwrap_err()? (I agree with you that without it it's most likely best to have MARKER at the return type)

I think "the idea of having ? return MARKER x.unwrap_err()" is also strictly an implementation detail that's not really relevant to the surface syntax debate, especially since ? is already more than just sugar over a macro.


To clarify, I believe the real, interesting issue here is whether we want these anonymous enum types to implement only the traits we explicitly ask for, or all the traits they possibly could implement. Now that this question has been raised, I believe it's the only outstanding issue that really needs to get debated to make a decision on whether MARKER goes at every return site or only once in the signature/binding.

My preference is of course for the traits to be listed explicitly, since I believe the primary use case to be function signatures where you have to list them explicitly anyway, and I also suspect that auto-implementing every possible trait could lead to unexpected type inference nuisances, or runtime behavior, though I haven't thought about that much.

Let's make the type inference nuisance thing concrete. Say Trait1 and Trait2 both have a foo method, and types A and B both implement both traits. Then you want to write a function that, as in your last two examples, returns enum impl Trait1 and has a let binding on a match with two branches. If we go with your variation, the let binding infers the equivalent of enum impl Trait1+Trait2 and a foo() call later in the function becomes ambiguous, while in my variation you have to explicitly write enum impl Trait1 so a call to foo() just works. That's a real disadvantage of auto-implementing all possible traits, right?

I think "the idea of having ? return MARKER x.unwrap_err()" is also strictly an implementation detail that's not really relevant to the surface syntax debate, especially since ? is already more than just sugar over a macro.

Well, I added it to answer your concern that it would be painful to have to add MARKER at return sites like ? :)

Let's make the type inference nuisance thing concrete. Say Trait1 and Trait2 both have a foo method, and types A and B both implement both traits. Then you want to write a function that, as in your last two examples, returns enum impl Trait1 and has a let binding on a match with two branches. If we go with your variation, the let binding infers the equivalent of enum impl Trait1+Trait2 and a foo() call later in the function becomes ambiguous, while in my variation you have to explicitly write enum impl Trait1 so a call to foo() just works. That's a real disadvantage of auto-implementing all possible traits, right?

That's true. However, the same could be said with regular types: if I return a single value (so no MARKER anywhere), that implements Trait1 + Trait2 and put it in an un-typed variable, then calls to foo() later will be ambiguous. So that's consistent with what we currently have, and I don't think that's a real disadvantage: it's still possible to explicitly type with return-site marker, if you want to implement only a single trait and/or type inference fails: let x: impl Trait1 = if foo() { MARKER bar() } else { MARKER baz() } (the marking of a specific type would “close” the TypeInProgress type and realize it)

I'm not seeing the significance of the TypeInProgress stuff; that seems like machinery you could use in implementing either variation of the syntax, and I don't see what it buys you other than perhaps guaranteeing the type of x goes away.

Well, apart from the end-result being cleaner (and I don't think enum layout optimizations could optimize enum { enum { Foo, Bar }, Bar, Quux } into enum { Foo, Bar, Quux }, at least with named enums, as the tag could have significance), I don't know about rustc specifically, but typing is usually done on the AST. And on an AST, I think it'd be easier to go forward and slowly complete the type of a variable, than to try to go backwards from the return point to the return sites, and from there check all the possible types that could be returned .

Actually, I'd guess that's how rustc currently does type inference:

fn foo() -> Vec<u8> {
    let res = Vec::new; //: TypeInProgress(Vec<_>)
    bar();
    res // Here we know it must be Vec<u8>, so the _ from above is turned into u8
}

This is probably another thing we need compiler people to comment on, […]

Completely agree with you on this point :)

Would it be practical to use a procedural macro to derive a specialized iterator for each word? (It seems possible, but a little verbose)

#[derive(IntoLetterIter)]
#[IntoLetterIterString="foo"]
struct Foo;

#[derive(IntoLetterIter)]
#[IntoLetterIterString="hello"]
struct Hello;

fn foo(x: bool) -> impl IntoIterator<Item = u8> {
    if x {  
        Foo 
    } else {
        Hello
    }
}

I'm concerned with the degree to which this seems to combine the implementation details of this specific optimization with the code wanting to use that optimization. It seems like, despite impl Trait itself being a relatively new feature, we're talking about extending it to include a form of reified vtables as an optimization, and exposing that particular choice of optimization with new syntax. And we're doing that without any performance numbers to evaluate that optimization.

I also wonder to what degree we could detect the cases where this makes sense (e.g. cases where we can know statically which impl gets returned) and handle those without needing the hint. If the compiler is already considering inlining a function, and it can see that the call to the function will always result in the same type implementing the Trait, then what prevents it from devirtualizing already?

I'd suggest, if we want to go this route, that we need 1) an implementation of this that doesn't require compiler changes, such as via a macro, 2) benchmarks, and 3) some clear indication that we can't already do this with automatic optimization. And even if we do end up deciding to do this, I'd expect it to look less like a marker on the return type or on the return expressions, and more like an #[optimization_hint] of some kind, similar to #[inline]

Just to add my thoughts to this without clutter, here is my version of the optimization: https://internals.rust-lang.org/t/allowing-multiple-disparate-return-types-in-impl-trait-using-unions/7439
Automatically generating an enum is one way to devirtualize, but without inlining a lot of redundant match statements would be generated.
I'm interested in seeing what performance gains can be gleaned from this, if any.

I think that automatic sum type generation should be left to procedural macros

@joshtriplett I don’t believe the only reason to want this is as an optimisation. One of the major reasons I want this is to support returning different implementations of an interface based on runtime decisions without requiring heap allocation, for use on embedded devices. I have been able to avoid needing this by sticking to compile time decisions (via generics) and having a few manually implemented delegating enums, but if this were supported via the language/a macro somehow that would really expand the possible design space.

I do agree that experimenting with a macro (limited to a supported set of traits, since it’s impossible for the macro to get the trait method list) would be the way to start. I’ve been meaning to try and throw something together myself, but haven’t found the time yet.

@joshtriplett to address part of your comment, i.e. benchmarks, I created a repository that uses my method and benchmarks it against Box. Although I only have one test case and it is somewhat naive, it seems that my method is about twice as fast as Box. Repo here: https://github.com/DataAnalysisCosby/impl-trait-opt

@Nemo157 I don't think you need heap allocation to use -> impl Trait, with or without this optimization.

But in any case, I would hope that if it's available as an optimization hint, it would have an always version just like inline does.

@joshtriplett Let's look at this example (here showing what we want to do):

trait Trait {}
struct Foo {} impl Trait for Foo {}
struct Bar {} impl Trait for Bar {}

fn foo(x: bool) -> impl Trait {
    if x {
        Foo {}
    } else {
        Bar {}
    }
}

(playground)

This doesn't build. In order to make it build, I have a choice: either make it a heap-allocated object:

fn foo(x: bool) -> Box<Trait> {
    if x {
        Box::new(Foo {})
    } else {
        Box::new(Bar {})
    }
}

(playground)

Or I do it with an enum:

enum FooBar { F(Foo), B(Bar) }
impl Trait for FooBar {}
fn foo(x: bool) -> impl Trait {
    if x {
        FooBar::F(Foo {})
    } else {
        FooBar::B(Bar {})
    }
}

(playground)

The aim of this idea is to make the enum solution actually usable without a lot of boilerplate.

Is there another way to do this without heap allocation that I'd have missed?

As for the idea of making it an optimization, do you mean “just return a Box and have the compiler optimize-box-away(always)”? If so, how would it handle no_std systems, that don't (IIRC, my last use of such a system was ~a year ago) actually have Box::new?

@Ekleog Ah, thank you for the clarification; I see what you're getting at now.

Regarding the third playground example, you can use derive_more to derive Foo.into(), or alternatively you can use derive-new to derive a constructor for FooBar.These libraries do not solve the complete problem in the RFC, but they may help a little.

AFAICS a procedural macro on the following form could potentially solve the complete problem

#[derive(IntoLetterIter)]
enum FooBar {
    #[format="foo"]
    Foo,
    #[format="hello"]
    Hello,
}

Quick question: How does this proposal look like on the calling site?

fn foo(x: bool) -> impl Iterator<Item = u8> { ... } // Uses what is proposed here

fn main() {
    foo().next(); // Usage like this?
}

And an idea. What about:

fn foo(x: bool) -> Box<dyn Trait>; // Rust 2018 version of `Box<Trait>`
fn foo(x: bool) -> dyn Trait; // Possible syntax for this proposal
fn foo(x: bool) -> dyn impl Trait; // Both keywords.
                                   // impl suggests that the actual type is unnamed
                                   // dyn suggests that there is dynamic dispatch

dyn would make sense to me because there is dynamic dispatch involved unless the compiler can infer that it is not required in a particular scenario. (Maybe this is nonsense. I'm just suggesting it in case it's not 😄)

commented

@MajorBreakfast The caller doesn't know (or care) whether the function is using auto-generated enums or not: everything works normally. So your example will work.

As for the syntax, my understanding is that dyn Trait is already used for trait objects, e.g. impl dyn Trait { ... }

And the performance characteristics (and behavior) of auto-generated enums is different from trait objects, so I'm not sure if it's a good idea to try and associate them together.

As for the syntax, my understanding is that dyn Trait is already used for trait objects, e.g. impl dyn Trait { ... }

Isn't this effectively a trait object on the stack instead of the heap? If not, where is the difference?

Edit: The difference is the size of course, duh o_O Wasn't thinking right when I wrote this. The question is: Is it close enough to call it dyn?

  • I agree with @Ixrec that the marker should be on the signature for convenience, but dropped in rustdoc because it's irrelevant for API compat (comment by @Ixrec)
  • I don't quite like enum as additional keyword
    • It's only half the story (data layout). The other half is dynamic dispatch.
    • The value does not behave like an enum. It's all hidden
commented

@MajorBreakfast Aside from the performance, there's also the fact that trait objects have type erasure: a Box<dyn Trait> can be anything that implements that trait. Whereas an auto-generated enum has a very specific and known set of types.

As for the syntax, my point is that the dyn Trait syntax is already being used, so it might not be feasible or desirable to use it for auto-generated enums.

It's only half the story (data layout). The other half is dynamic dispatch.

The "dynamic dispatch" is simply a match, which is the normal way of using enum. There's nothing special about it.

The value does not behave like an enum. It's all hidden

But it does behave exactly like an enum. The fact that it is an unnameable type (just like closures) doesn't change its behavior or how the programmer thinks about it.

Just like how programmers can reason about closures, even though their exact layout is unspecified (and they are unnameable), the same is true with auto-generated enums.

Aside from the performance, there's also the fact that trait objects have type erasure: a Box can be anything that implements that trait. Whereas an auto-generated enum has a very specific and known set of types.

From the user's perspective this is also type erasure. The types are only known to the compiler.

The "dynamic dispatch" is simply a match, which is the normal way of using enum.

The match that @Nemo157 mentions here only exists in generated code. I think the example he gives is more for illustration and it actually simulates how a trait object would redirect the call to the correct implementation.

But it does behave exactly like an enum.

No, you can't match on it.

commented

From the user's perspective this is also type erasure. The types are only known to the compiler.

Sure, it is a form of type erasure, but it still feels qualitatively different from Box<dyn Trait>. I can't quite articulate why it feels different for me.

The match that @Nemo157 mentions here only exists in generated code. [...] No, you can't match on it.

Of course that's a natural consequence of it being unnameable, but the performance and behavior should still be the same as an enum.

@Pauan

but it still feels qualitatively different from Box. I can't quite articulate why it feels different for me.

Differences:

  • As you said, dyn Trait can be a lot of types. This one can only be one of a few types mentioned inside the function.
  • A dyn Trait is unsized. At runtime it has a size and it's as big as it needs to be. This one is an enum, so it's size is known at compile time and it's as big as the largest of its variants.

Although I think the two are quite similar, I also think you're right for not wanting to call it a dyn.

but the performance and behavior should still be the same as an enum.

Performance, yes. But, all enum-ish behaviour isn't visible to the user. That's why I suggest not calling it an enum. If we can come up with something better that is ^^' (Making sum a keyword is a bad idea, because it'll break a lot of code for certain)


BTW the Unsized Rvalue RFC introduces unsized types on the stack. It doesn't allow functions to return an unsized value, but this might one day be possible in Rust. Consequently a solution other than an enum might be possible in the future. I still like the solution proposed here, because AFAIK async functions won't be able to support unsized types on the stack because they compile to a state machine.

Yes, it does indeed feel very different from Box, because at the end of the day the type is statically known. This should be reason enough.

I took the evening to throw together an experimental proc-macro based implementation: https://github.com/Nemo157/impl_sum

There's some big limitations documented in the readme, with probably other stuff I forgot/didn't notice, but if anyone else wants to experiment with this there's something to work with there now. (If you have any implementation comments/issues feel free to open issues on that repo to avoid cluttering this issue).

Re: syntax, what about an attribute in the type signature (not actually sure if attributes are allowed here but w/e)

fn do_something() -> Vec<#[auto_enum] impl Trait> {
    ...
}

Attributes are not typically considered part of the type signature anyway, so there's no problem with it being in the return type position.

An attribute in the type sig? That’s some super-ugly syntax. Plus there’s no precedent for it. The enum keyword makes more sense to me.

Out of all the proposals here something like #[marker] on the function itself makes most sense to me. In particular there are too many macros that just return so that a marker on the return position makes no sense.

@mitsuhiko The thing is, this functionality can't be properly replicated by a (procedural) macro. So making it look like it is a macro is just deceptive at best.

@mitsuhiko what macros are you thinking of? The only only I can think of is try!/? but wanting the error type to be an auto-generated sum type seems unlikely to me.

One extra difficulty might be supporting closure transforms, would it be possible to support a function like this where the sum type for impl Display happens inside an inner closure:

fn load(input: Option<&str>, number: bool) -> Option<impl Display> {
    input.map(|v| {
        if number {
            v.parse::<i32>().unwrap()
        } else {
            v.into_owned()
        }
    })
}

This example could also be extended to have 2 of the branches inside the closure, and an additional branch or 2 outside it.

@Nemo157 I can't judge how likely it is that errors might not be sum types here as we cannot predict what will happen in the future. I also think that modifiers on return are significantly harder to understand for users (and make the language more complex) than an attribute. Let alone that there are implied returns.

About which macros it affects: a lot of Rust projects have their own try macros. Almost all of mine have some custom try!/unwrap! type macros. The failure crate has custom "error throwing" macros etc.

@alexreg why can a procedural macro not replicate it? But regardless there are lots of compiler internals that are implemented as special macros or attributes so this would not be completely new to the language.

@mitsuhiko With a proposal like #[marker] on the function itself (as opposed to return type), how would you type things like this? (here using marker on return type for clarity)

let foo: impl Display = if bar { "foo".to_owned() } else { 5 };
println!("{}", foo);

I can understand the idea of having a marker on return type (and then the #[marker] syntax looks ugly to me, having -> Option<#[marker] impl Display> for @Nemo157's example, and I think another syntax would be better), but I don't really get the idea of having a marker on the function itself.

In my mind this is more a debate of how we want to say to Rust “Please wrap this value in an anonymous enum for me” and/or “Please make an anonymous enum out of these values”.

I prefer the first option (in part because I don't see a clear way for the user to understand from which values exactly the compiler will infer the type) And so I think the most intuitive is marker-on-return-site, but marker-on-return-type might make sense to.

Actually, to understand my reason given in parenthesis above, here is an example of why I feel uneasy about the return-type marker option:

fn foo() -> marker impl Trait {
    let bar = if test() { something() } else { somethingelse() };
    if othertest() { bar } else { stillotherstuff() }
}

Assuming something, somethingelse and stillotherstuff all return different types implementing Trait, not knowing how the compiler is implemented I can't really guess whether this will build or not. Is the type forced at the let bar boundary? Is it left “in progress”?

The advantage of the return-site marker option is that it makes things “explicitly implicit”: when encountering the marker, the value is wrapped in an anonymous enum ready to be extended, and when the being-built anonymous enum hits a type, it is realized. While with the return-type marker, the question is “which are the paths considered by the compiler as leading to the return-type marker?”, which I think can't be answered without a clear understanding of the internals.

About the issue of macros that return, they could just add the marker on each return site: if a single type is ever encountered by an anonymous enum, it will be an anonymous enum with a single member, which could (should?) actually be returned as the said member -- thus being a noop when there is no need for anonymous enums, and automatically adding anonymous enum capability when asked for.

@Ekleog

@mitsuhiko With a proposal like #[marker] on the function itself (as opposed to return type), how would you type things like this? (here using marker on return type for clarity)

let foo: impl Display = #[marker] {
    if bar { "foo".to_owned() } else { 5 }
};
println!("{}", foo);

Also I do wonder if the marker could not just go entirely. If the impact of that generated type is not too big then it might just be that this could be an acceptable solution to begin with. Hard to tell though.

@mitsuhiko So between

fn foo() -> marker impl Trait {
    let foo: marker impl Trait = if bar() { baz() } else { quux() };
    if x() { foo } else { y() }
}

and

#[marker]
fn foo() -> impl Trait {
    let foo: impl Trait = #[marker] {
        if bar() { baz() } else { quux() }
    };
    if x() { foo } else { y() }
}

you'd rather have the second one? (comparing to return-site as that's the closest to your proposal, with the smallest non-trivial example I could manage)

If so I think we can only agree to disagree :)

I just don't see a reason why this should become syntax in the first place. If it's such a great feature and the performance impact is a massive deal then it can still migrate from an attribute to real syntax later.

@mitsuhiko

Also I do wonder if the marker could not just go entirely.

I think there should be a marker:

  • Rust likes to make things explicit. If you've got two types with different sizes they get combined into an enum with the size of the bigger type plus discriminant. Should this really be hidden?
  • An enum is not the only way to solve this. Currently Rust does not support dynamically sized rvalues. It is however likely that this is going to change in the future.

Also to further add to my stance on attributes: even async/await started out with not introducing new syntax. This is a fringe feature in comparison.

I'm personally fine with using an attribute-like syntax for this, but I will note that it is 100% impossible to implement as a proc-macro (even looking at other proposed extensions to the type system like delegation, I'm certain that this will still not be possible anytime in the near future).

If there were a marker at return sites then it may be possible to implement this as some sort of syntax extension, or a limited proc-macro that only supports a pre-registered set of enums. Having a marker is not unprecedented as this is similar to a non-allocating, constrained version of boxing, which uses Box::new to wrap the return values:

#[marker]
fn foo() -> impl Trait {
    let foo: impl Trait = {
        if bar() { marker!(baz()) } else { marker!(quux()) }
    };
    if x() { marker!(foo) } else { marker!(y()) }
}

The versions that use either just a marker on the function, or a marker on the return type, are probably not implementable even as a syntax extension. These would need to tie in to type inference in order to detect where in the function the returned values do not unify and inject the necessary wrapping code to make it work.

@Nemo157 Would the following be possible with a compiler built-in?

fn foo() {
    let x: impl Trait = {
        if bar() { marker!(baz()) } else { marker!(quux()) }
    };
}

I believe the intention is to eventually allow impl Trait in more places, eg.

type X = impl Debug;

fn foo() -> X {
    "Hi!"
}

So you could use a syntax where the "automatic enum" is defined separately:

enum X = impl Debug;

fn foo(a: bool) -> X {
    if a { "Hi!".into() } else { 42.into() }
}

@MajorBreakfast yes, I believe so.

@mitsuhiko

@alexreg why can a procedural macro not replicate it? But regardless there are lots of compiler internals that are implemented as special macros or attributes so this would not be completely new to the language.

Ask @Nemo157, since he prototyped the implementation, but I believe it would be very difficult at best, if not downright impossible under the current proc_macro2 implementation, due to having to mess with the actual AST at a fine-grained level. I could be wrong, but I'll let him answer that.

Anyway, not sure what you mean by "compiler internals that are implemented as special macros or attributes", but actually the macros defined by Rust itself are not special-cased... they could be implemented by declarative or proc macros in a separate crate, if you wanted to.

@alexreg as an example the await! macro is not a macro but a compiler builtin.

@mitsuhiko Sure, but it might as well be implemented as a macro. enum is a different sort of beast.

@alexreg i really don't want to derail this topic any further but the current proposal for await! cannot be implemented as a plugin as far as I understand the RFC. In any case it's not exactly relevant to the point I was making.

OK I would like to enter this discussion. As @Ekleog showed, this feature can already be easily implemented manually by the programmer by creating a new enum type to hold all the different return types. So this feature doesn't add any new capabilities to the language. That being said, I think this feature is pretty cool. It make the language more accessible because of two main reasons. It makes this use case of impl Trait more ergonomic and cuts a lot of boilerplate.

So if the goal of this feature is cutting boilerplate and making the language more ergonomic it would make sense to only have to use the maker once in the function declaration instead of in each return site. Note that, the maker also has to be added in let expressions, and again, in the spirit of cutting boilerplate and making things more ergonomic, it makes more sense to use the marker only inside in the type instead of multiple times in the return statements.

Using the logic stated above this leaves us with two options, since both of this options use the marker only once.

fn foo() -> marker impl Trait {
    let foo: marker impl Trait = if bar() { baz() } else { quux() };
    if x() { foo } else { y() }
}

and

#[marker]
fn foo() -> impl Trait {
    let foo: impl Trait = #[marker] {
        if bar() { baz() } else { quux() }
    };
    if x() { foo } else { y() }
}

Both of these syntaxes use the maker only once (per let or per function) and therefore are in the spirit of this RFC. If you use the marker on each return site, than their is less of an incentive for this RFC to exist. After all, the only code that the feature would save you is declaring the enum by hand. Introducing a new syntax just to avoid declaring a enum seams a little excessive. I mean, it could still be done, but we would have less of a win in our hands.

If you are still not convinced about the debate where to put the maker, I have one more argument to try and convince you. This other argument not only says that we should use the maker only once, it also says what that marker should be and why it should be that way. In the following I will make a case for this particular syntax:

fn foo() -> enum impl Trait {
    let foo: enum impl Trait = if bar() { baz() } else { quux() };
    if x() { foo } else { y() }
}

My argument is about teaching and learning rust. Rust is a fairly complicated language. New programmers are constantly fighting with the compiler. In order to mitigate this fighting the compiler often suggests changes to your code. This suggestions make the learning experience much less frustrating. Add a keyword somewhere and suddenly your code not only compiles but also works as expected (assuming the logic is correct). This experience is sort of magic and very satisfying when it works. The syntax proposed above can have this property. The compiler can show you the error of the type mismatch, but can also suggest that you add a single enum in the appropriate place to solve the problem. Once the new rustacean inserts the suggested enum keyword in his function declaration or let statement, his code will magically work. He might not understand exactly why it works, but it will work. Once his code compiles he might try to find some documentation and find out what is happening. So he will do a search for something like "rust enum impl". He will then find a blog post, or reddit post or the Book or whatever that contains the appropriate explanation. He will than learn that enum impl Trait means exactly what it says on the tin. eg. the compiler is creating an anonymous enum of the return types of your function or let statement, and all members of that enum have impl Trait. Basically the compiler is creating an enum in which all members implement a particular Trait. Hence enum impl Trait.

I just want to add that I think an annotation above the function is unintuitive. Such an annotation makes sense if it affects the whole function, e.g. like the #[test] annotation. In contrast to that, this marker just affects the return type and therefore should be near it or at the return sites.

After all, the only code that the feature would save you is declaring the enum by hand.

@Paluth Not really correct. As discussed above the enum is just the data structure. It doesn't act like an enum: You can't match on it. Instead you can call all the methods of the Trait(s). The code @Ekleog shows here requires the user to match on the enum. The code that @Nemo157 shows here is impractical to write by hand.

I agree with the things that you say about teachability.

@Paluth Just, for the impracticability of writing enums by hand, here is a real-life example of where it is a pain to maintain, especially every time I add a return site to the function I must come back to this file and change everything.

About teachability, I mostly agree with you, but I think the compiler could suggest adding markers at return site too? That said it'd likely be a mess to see, as the compiler would have to point to the two places where the markers would have to be added, and ascii art can only do so much.

@Ekleog Damn. This is some real spaghetti code! Descriptive file name, though 😄

To add to what @MajorBreakfast just said about annotating the function, it also doesn't make sense for all use cases. Given a function signature like

fn foo() -> Result<impl Read, impl Error + Debug>

you may want to return multiple possible readers, but have a specific error type in mind that you just don't want to publicly name yet.

This sort of usecase is pushing me towards the marker on the return type syntax, either an attribute like @Diggsey suggested above or a keyword, that would allow writing this signature like:

fn foo() -> Result<#[marker] impl io::Read, impl Error + Debug>

and get the auto-generated sum type for only one of the existential types.

It also seems easier to extend to named existential types, the same marker could be used when declaring the type:

existential type Foo: #[marker] io::Read;

fn foo() -> Result<Foo, impl Error + Debug>;

The other form I am currently considering as being a relatively strong contender is having just a marker on each return value, in contrast to what @Paluth says above I believe the overhead of writing the boilerplate to do the delegation (here's what it looks like for an enum over io::Read + io::Write for a single variant) vs the overhead of adding a single annotation at each return site (which you would have to do when boxing anyway) makes any kind of sugar for this worth it.

One downside of this form is that it is relatively easy to do on a case by case basis as a purely library implementation, re-using an example from earlier you could imagine taking the existing either crate and adding delegating trait implementations to it:

fn foo() -> impl Trait {
    match x() {
        Some(foo) => Either::A(foo),
        None => Either::B(y()),
    }
}

I still believe that providing builtin support is better than this for a couple of reasons:

  1. This suffers from the same issue a proc-macro based implementation does, it requires someone to pre-declare all traits that it works, which requires either a lot of boilerplate1, a rather heinous proc-macro to generate the boilerplate or a more powerful delegation than has been proposed as an RFC yet.

  2. Changing this method suddenly adds a lot more churn, say x() changed to return a ternary value, now you would have to switch from Either to some other Either3 form:

    fn foo() -> impl Trait {
        match x() {
            First(foo) => Either3::A(foo),
            Second(bar) => Either3::B(bar),
            None => Either3::C(y()),
        }
    }

    (pre-post edit: and @Ekleog links to a representation of just such this churn 😄)

1: this is only for a single number of variants, you would need to repeat this for all 1..n enums to support up to n variants

@mitsuhiko I'm pretty sure it can be... but I'll let others more knowledgeable confirm or deny.

@MajorBreakfast you are right about not being able to match on the return value of a fn foo() -> enum impl Trait, and therefore you could argue that the return type of foo doesn't really represent an enum since it doesn't behave like one. But it would hardly make sense to try to match against and anonymous enum. Since the enum is anonymous you don't know what it looks like and therefore you can't provide a pattern that would make sense, unless the pattern was generic like match x { a => ... } or match x { _ => .... }, but that type of match doesn't do anything. So one could argue that by definition, an anonymous enum is unmatchable. But all this is kinda of off-topic, and even a bit pedantic. What really matter to the user of rust is that the return type of foo automatically implements Trait, and that it auto-generates the match expression needed to delegate the calls of the Trait methods to the return values of foo.

As @Ekleog showed and @Nemo157 reinforced, the auto-generated match statement to delegate the method calls of Trait can save a lot of boilerplate code and therefore would easy justify a new syntax even it it meant you have to add it to every return site. I underestimated the amount of code that the auto match saves the user.

That being said, I still fail to see any reason why adding an annotation to each return site is better than adding a single annotation on the function return type, or the let type. If the user is going to have to write more code to get the same result, than we need a good reason to make it that way. Could you guys elaborate what those reasons are? By that I mean, what feature does annotating at each return site provide over annotating once on the type?

commented

I think I have a new desideratum to add to the pile: consistency with possible syntax for anonymous enums (not autogenerated enums).

fn foo() -> #[marker] impl Debug {
    if(...) { A::new() } else { B::new() }
}

fn bar -> #[marker] (A | B) {
    if(...) { A::new() } else { B::new() }
}

So pretend for a moment that we want (A | B) to be the syntax for an enum type with no name and variants of types A and B. Despite being nameless, this type is not hidden by impl Trait so bar()'s callers could match on it. Presumably, if we ever added this, we'd also like bar() to compile more or less the way I wrote it, rather than requiring something like (A|B)::A(A::new()) to explicitly create a value of that anonymous enum type (we'd probably need that syntax somewhere, but imo we shouldn't need it for this).

If we'd want some kind of marker on anonymous enum return types to opt-in to this implicit wrapping behavior, I assume we'd want it to be the same marker that we use for autogenerated enum return types that also do this sort of implicit wrapping (albeit with a hidden autogenerated type you couldn't explicitly refer to anyway). This gives us an argument against using enum as the marker: enum (A|B) looks pretty redundant when (A|B) is already an enum type. Of course, it's also conceivable that we'd want no marker at all in the anonymous enum case, or no implicit wrapping for anonymous enums, or no anonymous enums at all (I have no strong opinions here yet). Thoughts?

@Paluth My reasoning is mostly the one put forward at #2414 (comment) plus what I completely failed to explain clearly at #2414 (comment).

I'll try to explain it another way: I think the advantage of marker-at-return-site is demonstrated by the following code:

fn foo() -> impl Trait {
    let a = if b() { marker c() } else { marker d() };
    if e() { a } else { marker f() }
}

That is, being able to have variables that are still-lifted-in-not-completed-enum-type-yet.

On the other hand, with marker-at-return-type, the code would have to look like (in order not to be dependent on compiler internals, optimization level and the like):

fn foo() -> marker impl Trait {
    let a: marker impl Trait = if b() { c() } else { d() };
    if e() { a } else { f() }
}

I prefer the first syntax, because marker would mean “Please use this value to build whatever enum I'll want later on,” and because I feel it'd make for easier refactoring (as the marker can be basically anywhere and is just a noop when not actually used by a merge point anywhere). OTOH, the second syntax requires explicit type annotation in the let binding (which I try to minimize in the code I write), and even requires annotating at every point where a type conflict could appear.

There is also a question about the marker-at-return-type option: what about this?

let a: marker impl Trait = match foo() {
    Foo1 => if bar() { baz() } else { quux() },
    Foo2 => iwantmorenames(),
}

Should the compiler be able to infer that the baz and quux calls must be lifted in an anonymous enum? Should it just lift in the anonymous enum the match?

Actually, writing this I think I understand better why I prefer the marker-at-return-site option:

  • Marker-at-return-type means “take all the places that point to here and merge them into an anonymous enum.” Which is, IMHO, problematic as what the compiler will consider as “places that point to here” will likely be highly implementation-dependent, or even optimization-dependent (see eg. rust-lang/rust#42974 for a case where the syntax changes depending on optimizations -- I don't think we'd want that on stable)
  • Marker-at-return-site means “from here until the next point where the type is forced, this variable is an anonymous enum into which any other anonymous enum can be merged.” Which has the nice property of having a beginning and an end, and so of being easily understandable, and there is no second-guessing what the compiler would consider a “path that points to here.” :)

@Ixrec I think the question of “what should marker be” is a bit early, like painting drawings on the bikeshed when the background color is not picked yet :)

That said, the question of anonymous enums you raised is interesting indeed. And I'd argue it's a (not very strong at all) argument in favor of marker-at-return-site: with marker-at-return-site, the marker is decorrelated from the return type, thus it makes for a consistent syntax and straightforward path to supporting anonymous enums. It'd just require allowing forcing the type to an anonymous enum from a marker'd enum.

Basically, your example would look like:

fn bar() -> (A | B) {
    if(...) { marker A::new() } else { marker B::new() }
}

Where marker would implicitly be (A|B)::A().

@Ixrec There's been no discussion of untagged enums outside of FFI, that I know of. I'm not against the proposal of anonymous enums or structs though, in general.

I'd imagine untagged enums as return types would work something like this:

fn bar() -> enum { A(i32), B } {
    if(...) { A(123) } else { B }
}

So while we don't have to use the enum keyword for auto-generated sum types, I see nothing precluding it.

commented

@Ekleog

fn foo() -> impl Trait {
    let a = if b() { marker c() } else { marker d() };
    if e() { a } else { marker f() }
}

What type is a?

Why doesn't this require let a: impl Trait?

Does this create two different enums (one for a and one for the return type of foo)?

If so, why doesn't this require if e() { marker a } else { marker f() }?

While I'm not necessarily advocating this (I would prefer to avoid going down this road), as an alternative to generating a type solely for the return value, has anyone considered the idea of Ocaml-style sum types, of the kind that use constructors starting with a backquote?

Personally, though, I'd prefer to avoid putting this in the language, and instead provide a mechanism to simplify the creation of sum types on the fly.

@Ekleog thank for the more detailed explanation. I might have misunderstood what you are trying to say, and if that is the case, I apologize. However if I understood what you were trying to say correctly, then that means you might be a little confused about how this feature will work. I will try to explain it more clearly.

Lets say that we have some code like this:

let a: impl Trait = expression

What can we infer from the expression? Well a mathematician or type theorist might come up with sorts of conclusions, but I'm neither of those, so my conclusions will be limited. I can infer two things from expression, according to how rust currently work.

1 - The type of every value that expression can return will have to implement Trait
2 - The type of every value that expression can return will have to be the same

Unless there is a bug in the compiler, the compiler can already guarantee points 1 and 2. How does the compiler know that? Well for every valid expression the compiler can determine all the return sites with a 100% precision. Not only that, the compiler can also determine the type of the values that are returned in all of the return sites. These abilities that the compiler has, are not implementation-dependent. They are deterministic and and every implementation of rust, regardless of who or how its made, will have to achieve that. If they don't achieve points 1 and 2, it either means they are incompatible with rustc or that they have a bug.

OK, so what does all of that have to do with the current discussion? What the proposed feature is trying to do is eliminate point 2. So lets look at some code examples to see how all of this applies.

let a: marker impl Trait = match foo() {
    Foo1 => if bar() { baz() } else { quux() },
    Foo2 => iwantmorenames(),
}

Should the compiler be able to infer that the baz and quux calls must be lifted in an anonymous enum?
Should it just lift in the anonymous enum the match?

Absolutely! Not only should the compiler be able do infer that in the future (should this feature ever make into rust), but it kinda of already does. If you were to write such code today (without the marker of course), the current compiler (rustc 1.26) will be able the determine all tree return points with 100% deterministic precision. It will also be able to determine the types of the values returned by baz, quuk and iwantmorenames functions. Then it will processed to check points 1 and 2. The only difference this RFC would introduce is the following: if point 2 fails, but point 1 still stands, then it will "lift" those values into the anonymous enum. Notice that in this particular example, it would not "lift" the function calls, it will "lift" whatever the return values of those calls are.

@joshtriplett as was said earlier in the thread, the problem at hand is not so much about generating an 'enum'. The main problem is having to manually implement the match expression on said 'enum'. As the 'enum' grows (because you have more return types), the matching get worst. This feature proposal would eliminate the need for manually writing the match.

@joshtriplett That's requiring an awful lot of boilerplate though, as @Paluth is pointing out. I don't know anything about OCaml sum types. How do they work?

@Ekleog

Has already been pointed out by @Paluth:

fn foo() -> impl Trait {
    let a = if b() { marker c() } else { marker d() };
    if e() { a } else { marker f() } // <-- `a` needs marker in front
}

If we'd want some kind of marker on anonymous enum return types to opt-in to this implicit wrapping behavior, I assume we'd want it to be the same marker that we use for autogenerated enum return types

@Ixrec I highly discourage this. The anonymous enum feature you're mentioning produces an actual enum that can be used in match expressions. This feature OTOH does not. That's why I don't even recommend using the enum keyword. We should not strive for similarity between these two features.


About the discussion whether to put the marker in the type or at return sites:

fn foo() -> impl Trait {
    let a: Result<marker impl Trait, String> = if cond1() { Ok(f1()) } else { f2() };
    let b: marker impl Trait = if cond2() { a.unwrap() } else { f3() };
    b
}

I'd prefer to add it to the type:

  • This is where the action is: This is were all the variants are combined into the single sum type. Markers at the return sites OTOH are only for one variant. The type is however determined by all variants
  • Convenience: A single place means less typing

Maybe add the marker at the trait declaration site? It cannot be made to work for all traits anyway; there are certain requirements for the trait, which are similar, but weaker than, object-safety; meanwhile, explicit annotations for the latter have been proposed on internals, so there is some precedent here.

My suggestion:

  • Allow the user to mark trait Trait {} blocks as #[additive]. If this attribute is present, the compiler checks that:
    • All the trait's methods have Self in the argument (i.e. contravariant) position;
    • All its associated types have additive traits as bounds and appear only in the return (i.e. covariant) position of its methods;
    • The trait has no associated constants.
  • If Trait is additive, methods with a declared impl Trait type are allowed to pass values of different types as return values at different exit points. The actual underlying return type will be an anonymous coproduct type as proposed here.
  • If Rust ever gets proper anonymous enum types, the same marker would also mean that (T0|T1|T2|...) implements the trait whenever each of T0, T1, etc. implement it, by the same mechanism.
  • By the same token, marking Trait as additive ensures that an impl Trait for ! is available. (Bringing some resolution to another thorny issue.)

Oh wait, the associated types part won't be so easy. After all, there's impl Iterator<Item=T> where T should be treated like a concrete type. But something along the above lines.

@Ixrec I wouldn't want automatic injection into (A|B) types any more than I'd want automatic projection from (A, B). (Or, for that matter, than automatic injection into Result<A, B> or Option<A>.)

Off-topic

A previously proposed construction/deconstruction syntax was (some_a|!) resp. (!|some_b), which is "shaped-based" like tuples are; another possibility would be taking inspiration from tuples' numeric field access, and doing something like 0(some_a) and 1(some_b), although that's a bit weird (and I'm not sure if it's syntactically unambiguous). Anyway, I think this has been discussed in the RFC PRs and issues about it.

I think that From/Into handle injection fine, but if those prove ambiguous then the macro might generate .into_enum()/.into_sum() methods. Also, injecting non-explicitly might simply happen for other reasons, assuming folks do not rush into this.

I could imagine eventually optimizing trait objects into enums, or at least not being DSTs, when their size can be determined at compile time. If so, then roughly this works:

fn foo() -> impl Trait {  // dyn Trait : Trait
    // Some auto_enum!{Trait} macro generates the following replacing $t:
    trait Summand$t { }
    impl Summand$t for Foo {}
    impl Summand$t for Bar {}
    type AutoSum$t = dyn Trait+Summand$t;  // Not a DST because Summand$t is not exported
    // regular code:
    ...  return x;  ...  // Conversion form T: Trait to dyn Trait is automatic.
}

@burdges Trait objects can be a lot more efficient than enum types when the number of variants is large, so they probably won't be going anywhere.

@Pauan (from here)

The reason why this doesn't require : impl Trait on the let binding is what I've tried to explain in #2414 (comment), with the TypeInProgress type (esp. the first code example -- actually I just noticed I added an unnecessary marker in the if branch there, as x already had TypeInProgress(…) type). I'm not completely sure it could be implemented in the compiler, but can't see a reason why it couldn't.

Basically, the idea is that so long as no type is enforced by a type annotation or passing to a function call, the compiler builds an ever-growing enum, and when a type is enforced (eg. with : impl Trait or : (A|B) if we have anonymous enums with this syntax, or foo(x) if foo imposes type restrictions on the value, or even unification with another if/match branch), then the type of the variable “retroactively” becomes this type (if it matches). Don't be scared by the “retroactively” word, it's just like the current behaviour of {integer}, except with custom traits instead of just integers.

I'm not a functional developer, so am not sure this is the right way to put it, but in my mind marker would “lift” the value into a TypeInProgress “monad”, and forcing the type of such a variable would execute the “monad” and recover its result.

Now, things become harder when considering types like Result<u8, impl Fail>, and that's why I'm not completely sure it is possible to implement (even though it appears to work not-so-bad with {integer} currently): something like this should compile:

fn foo() -> Result<u8, impl Fail> {
    if a() { Ok(0) } //: Result<{integer}, _>
    else if b() { Err(marker c()) } //: Result<{integer}, TypeInProgress(C)>
    else { Err(marker d()) } //: Result<{integer}, TypeInProgress(Fail, enum { C, D })>
}

And here, hoisting TypeInProgress into other template structures may hide pitfalls for properly implementing this (or maybe not, at least that's what I'm hoping for).


@joshtriplett (from here)

This sounds like the return-site-marker approach, adding in that the markers are named. But then I'm curious, do you think this should compile?

fn foo() -> u8 {
    if bar() { `a 0 } else { `b 1 }
}

If it should compile, then it's going pretty far away from OCaml-style anonymous enums. If it shouldn't, then it makes try!-like macros hard to write, while with the return-site-marker approach the try!-like macros can just add marker everywhere and it becomes a noop if it's unused.


@Paluth (from here)

Unless there is a bug in the compiler, the compiler can already guarantee points 1 and 2. How does the compiler know that? Well for every valid expression the compiler can determine all the return sites with a 100% precision. Not only that, the compiler can also determine the type of the values that are returned in all of the return sites. These abilities that the compiler has, are not implementation-dependent. They are deterministic and and every implementation of rust, regardless of who or how its made, will have to achieve that. If they don't achieve points 1 and 2, it either means they are incompatible with rustc or that they have a bug.

Not necessarily. Taking back the match-if example (let's assume for now all these functions return Foo):

let a: Foo = match foo() {
    Foo1 => if bar() { baz() } else { quux() },
    Foo2 => iwantmorenames(),
}

I don't know the Rust compiler specifically, but most compilers are built like this:

  1. Parse the file into an AST like (simplified here:
let-binding "a": Foo
 → match (foo())
    → case Foo1: if (bar())
       → true: baz()
       → false: quux()
    → case Foo2: iwantmorenames()
  1. Typecheck the result:
let-binding "a": Foo
 → match (foo()): Foo
    → case Foo1: if (bar()): Foo
       → true: baz(): Foo
       → false: quux(): Foo
    → case Foo2: iwantmorenames(): Foo

The important thing here is that typing occurs on all AST nodes. Which means that the value of the if/else must have a type.

With the return-type-marker approach, this type is undefined, because baz and quux have different return types, and the if is not constrained by a marker impl Trait type boundary. (that said, you're right in that this shouldn't be dependent on optimizations, I was considering typing occurring later, which would be surprising indeed, even if technically possible)

So actually I'm worried that the return-site-marker approach would

  • either not work across the match-if example and require an explicit marker, like:
let a: marker impl Trait = match foo() {
    Foo1 => {
        let x: marker impl Trait = if bar() { baz() } else { quux() };
        x
    },
    Foo2 => iwantmorenames(),
}

which would be a big drawback for both usability and learnability.

  • or work across the match-if example, but then every unannotated merge point between two expressions becomes an anonymous enum, and error reporting for the whole rust compiler becomes a mess when an actual typing error occurs
  • or, last solution, come from the : marker impl Trait and implicitly expand the marker down in the AST. Which would work for let bindings, but would be harder to make work for function returns (because there can be multiple return points) and would have unexpected effects on refactoring:
trait Trait { fn with_set_something(self, b: bool) -> Self; }
fn f2() -> impl Trait {}
fn f3() -> impl Trait {}
// typechecks
fn foo() -> marker impl Trait {
    if bar() {
        (if f1() { f2() } else { f3() }).with_set_something(true)
    } else {
        (if f1() { f2() } else { f3() }).with_set_something(false)
    }
}
// no longer typechecks, requires adding `: marker impl Trait`
fn foo() -> marker impl Trait {
    let x = if f1() { f2() } else { f3() }
    if bar() { x.with_set_something(true) }
    else { x.with_set_something(false) }
}

@MajorBreakfast (from here)

I'd have written your example like:

fn foo() -> impl Trait {
    let a = if cond1() { Ok(marker f1()) } else { f2().map(|x| marker x) };
    if cond2() { a.unwrap() } else { f3().map(|x| marker x) }
}

with the return-site approach. (see my reply to @Pauan above as to why it should be possible to make this work)

It's true that the .map(|x| marker x) is a bit painful to write, but I'm not seeing it as being worse than : Result<marker impl Trait, String> :) (esp. with the drawbacks of the return-site-marker approach raised in my reply to @Paluth above)


@fstirlitz (from here)

I'd think your point on which traits should be auto-derived is mostly orthogonal to the discussion on where to place the marker? I don't think having explicitly-additive enums could avoid markers: we would very likely want Copy to be additive, but then code like:

let x = if foo() { 0u32 } else { 0f32 };
function_expecting_u32(x);

would have an error message like “x is of type AnonymousEnum(Copy, enum { u32, f32 }), expected u32” at the call of function_expecting_u32, which would be very unexpected as the error would be at the 0f32 place.


So to sum up my opinion (and I'm more and more feeling like I'm alone in this opinion, so won't bother you much longer with it :)):

  • I think a marker, either at return-site or at return-type, is necessary
  • I think the return-type-marker approach has the inconvenient of not explicitly saying where types start being enum-ified, and as such will likely generate surprising behaviour that'll implicitly leak the internals of the compiler (eg. “typing occurs at the AST phase”), thus hindering learnability
  • I think the return-site-marker approach has the advantage of being explicitly “from the marker up to the next type enforcement, the type is enum-ifiable with other enum-ifiable types”
  • The return-type-marker approach is roughly as verbose as the return-site-marker approach as soon as we start considering non-trivial examples where everything is not just coming from a single match (I'd like to say “more verbose” but would likely be contradicted with an example I didn't think of)

And to be fair, the advantage of the return-type approach is that it's closer to the place where type merging is actually done.

Also, just as a last point: I'm sorry for the confusion the “return-site-marker” expression may have spread, but I can't think of any better term, and “marker-lifting-values-to-enum-ifiable-monad” is way too scary to be usable 😁

Unrelatedly, @Nadrieril pointed me to a potential pitfall of this proposal:

fn foo<T: Iterator<Item = u8>>(iter: T, n: usize) -> impl Iterator<Item=u8> {
    let mut iter = marker i;
    for _ in 0..n {
        iter = marker iter.enumerate();
    }
    iter
}
// or
fn foo<T: Iterator<Item = u8>>(iter: T, n: usize) -> impl Iterator<Item = u8> {
    let mut iter: marker impl Iterator<Item = u8>> = iter;
    for _ in 0..n {
        iter = iter.enumerate();
    }
    iter
}

So I think the easiest solution for now in order to reject this program that'd require realization of a 2⁶⁴-elements enum is to outright forbid enum-ification in assignment to mut variables for now, and maybe in a follow-up RFC relax this requirement to allow some cases of assignment to mut, just like what is happening with const fn.

A proposal:

fn foo_agst(a: bool) -> impl ::std::fmt::Debug
{
    let b: dyn ::std::fmt::Debug = if a {
        7
    } else {
        "Foo"
    };
    
    b
}

Rationale:

Auto-generated sum types (AGSTs) are pure syntax sugar

It is always possible to manually write the sum type. It's extremely verbose, and if some of your variants are impl Traited themselves it would require writing new type wrappers for those variants as well, but it is possible. So if you really do need to avoid boxing everything for performance reasons it can be done today.

This suggests that any proposal for AGSTs should be judged heavily on the syntax and ergonomic benefits provided.

impl Trait in return-type position makes no guarantees to the caller beyond what it says

A function with impl Trait in return-type position today can return boxed trait objects, things that use dynamic dispatch, etc. Adding marking to the return type only in cases where said type happens to be an AGST provides no additional valuer to the caller.

A monomorphizing let binding is already required to return boxed trait objects of differing concrete types through an impl Trait

In other words, this compiles:

fn foo(a: bool) -> Box<::std::fmt::Debug>
{
    if a {
        Box::new(7)
    } else {
        Box::new("Foo")
    }
}

But this doesn't:

fn foo(a: bool) -> impl ::std::fmt::Debug
{
    if a {
        Box::new(7)
    } else {
        Box::new("Foo")
    }
}

But this does:

fn foo(a: bool) -> impl ::std::fmt::Debug
{
    let b: Box<::std::fmt::Debug> = if a {
        Box::new(7)
    } else {
        Box::new("Foo")
    };

    b
}

Since impl Trait as a return type already requires an explicit monomorphizing binding to return different boxed trait objects it makes sense to reuse that spot for an AGST marking.

AGSTs should be seen as a foil to boxed trait objects, not to a monomorphized impl Trait

As shown above, boxed trait objects can already be returned through an impl Trait in return-type position (if there exists a impl<T: Trait> Trait for Box<T>, which there usually does). An AGST is not a special case of impl Trait, it's another thing you can shove through an impl Trait-sized hole. But there's no reason they should be solely limited to impl Trait.

Put another way, I should also be able to write

struct DisplayForDebugWrapper<T>(T);

impl<T: Debug> ::std::fmt::Display for DisplayForDebugWrapper<T> {
  fn fmt(&self, f: &mut ::std::fmt::Formatter) -> ::std::fmt::Result {
    write!(f, format!("{:#?}", self.0))
  }
}

fn foo(a: bool) -> impl ::std::fmt::Display {
  let b: dyn ::std::fmt::Debug = if a {
    7
  } else {
    "Foo"
  };

  DisplayForDebugWrapper(b)
}

This will be very important for futures, where authors will want to have parts of a processing pipeline that are conditional and other parts that are not. i.e. my database backend might vary but my code to display values from the database is shared.

The salient performance tradeoff in choosing a sum type vs a boxed trait object is potentially over-allocating space vs having a separate but minimal heap allocation

Both sum types and boxed trait objects use dynamic dispatch, and there's no a priori reason that a sufficiently smart compiler cannot transform a sum type's matches into a vtable-style dispatch if that makes sense. A sum type does require reserving space for its largest variant up front though, while a boxed trait object allows a minimal allocation.

If Rust did grow AGSTs I expect the primary performance pitfall would be programmers auto-summing a "frequent and small" variant with an "infrequent but enormous" variant.

This further suggests that the difference in syntax vis-a-vis boxed trait objects should highlight the boxing.

The "should I box it or just pass it around unboxed" question already exists throughout Rust

This is not new cognitive overhead for a Rust programmer concerned about performance.

The reason people don't like boxed trait objects is not usually performance, it's that boxing is annoying.

The syntax overhead of boxing things gets rather annoying. Most programs won't notice the overhead of dynamic dispatch or boxing, but their authors will notice the overhead of typing Box::new(x) all over the place.

Boxed trait objects are already becoming Box<dyn Trait>, so AGSTs as dyn Trait are a clear unboxed analog

i.e. we'll eventually have

fn foo_box(a: bool) -> impl ::std::fmt::Debug
{
    let b: Box<dyn ::std::fmt::Debug> = if a {
        Box::new(7)
    } else {
        Box::new("Foo")
    };

    b
}

fn foo_agst(a: bool) -> impl ::std::fmt::Debug
{
    let b: dyn ::std::fmt::Debug = if a {
        7
    } else {
        "Foo"
    };

    b
}

Both use dynamic dispatch, and the salient difference on the heap allocation is precisely reflected in the syntax differences.

This looks interesting, even though this would add some sized and usable-without-fat-pointers dyn Trait, which may be unexpected. That said, I have a question.

With boxing, we can do things like:

fn foo(a: bool) -> impl Debug
{
    if a {
        Box::new(7) as Box<Debug>
    } else {
        Box::new("Foo")
    }
}

(ie. without a separate let).

This is the equivalent of return-site marker, and the separate-let solution you raised is the equivalent of return-type marker.

Do you think

fn foo(a: bool) -> impl Debug
{
    if a {
        7 as dyn Debug
    } else {
        "Foo"
    }
}

should be made to compile?

If so, then we get both return-type-marker and return-site-marker (although here it is actually return-site-marker and not marker-lifting-values-to-enum-ifiable-monad), as one prefers.

Also, then there is the question of how to put it in the return-type position for use with ?, as we would likely not want every use of ? to generate an enum. (that, or saying that one-type autogenerated enums can be converted to the said one type)

That said, @Nadrieril pointed out on IRC a potential unexpected behaviour of this proposal:

fn foo(a: bool) -> Box<dyn ::std::fmt::Debug> {
    let b: dyn ::std::fmt::Debug = if a {
        7
    } else {
        "Foo"
    };

    Box::new(b)
}

Here, the user would expect a single dynamic dispatch, but (apart from potential compiler optimizations) two would occur: one at the Box level, and then another one for the enum. Then, that's likely an optimization question. :)

@Ekleog

So I think the easiest solution for now in order to reject this program that'd require realization of a 2⁶⁴-elements enum is to outright forbid enum-ification in assignment to mut variables for now

I don't think the issue there is that it's assigning to a mut variable. The issue is that it's generating recursive types. When initially assigning let mut iter = marker iter; you start the sum type off with typeof(iter) = { T | ... } (to pick some arbitrary syntax for unfinished AGST I hope is relatively easy to understand). Then when assigning iter = marker iter.enumerate() you extend this to typeof(iter) = { T | Enumerate<typeof(iter)> | ... }. You now have a recursive type, which cannot be supported by Rust.

This probably results in a very similar limitation in practice, but I believe is a more correct way to look at it.


@khuey

So if you really do need to avoid boxing everything for performance reasons it can be done today.

This is not only necessary for performance reasons. The main reason I want a feature like this is for alloc-less embedded development.

I do agree with a lot of the rest of your points. There's a reason this isn't auto-generated enums, as I mentioned a long time ago, and as @DataAnalysisCosby linked to, this can be alternatively implemented via a union + vtable.

I don't think dyn Trait fits as a syntax though. dyn Trait already has a meaning, it's an unsized dynamic trait object. Just because they cannot today be used without hiding it behind something that can stored the associated size data like Box or &mut doesn't mean they will never exist as a directly usable type.

@Ekleog

I actually wasn't aware of the "as at the return site" syntax. Maybe we would want that to work. But, unlike Box<(dyn) Debug>, dyn Debug isn't actually a named type. The difference there is worth thinking about at least; there's not currently any as <not-a-type-name> that I'm aware of.

I'm not sure I understand the question about interaction with ?.

It would be sort of confusing for boxing a dyn Trait to not be identical to Box<dyn Trait>. It seems straightforward for an AGST to roll out into a boxed trait object by pulling out its individual variants though. i.e. auto-implement Into<Box<dyn Trait>> for the AGST and then warn if it's explicitly boxed. Or it could even be rolled out implicitly if people are comfortable with it.

@Nemo157

This is not only necessary for performance reasons. The main reason I want a feature like this is for alloc-less embedded development.

Noted, but I don't think this actually changes anything I said. You can still avoid the boxing today. And if you don't have an allocator it's even more essential!

I don't think dyn Trait fits as a syntax though. dyn Trait already has a meaning, it's an unsized dynamic trait object. Just because they cannot today be used without hiding it behind something that can stored the associated size data like Box or &mut doesn't mean they will never exist as a directly usable type.

Maybe. It's not clear to me what you'd ever use dyn Trait for as an actual type. Has anyone proposed anything that would make sense?

@khuey I like the 7 as dyn Debug syntax a lot. I've already proposed usage of the dyn keyword above but had to conclude that dyn Trait is something different than the sum types that are discussed here. It is however very likely that Rust will one day support the syntax you're proposing, not with sum types, but with real unboxed, dynamically sized dyn Trait values on the stack. It will be without heap allocation. It will require less or equal memory compared to sum types. But, it won't work inside async functions across yield points because all types of values that are stored in the state machine need to be statically sized.

commented

@khuey

Maybe. It's not clear to me what you'd ever use dyn Trait for as an actual type. Has anyone proposed anything that would make sense?

rust-lang/rust#48055 has been in the works for a while.

But even without that, I would consider it far too confusing to have dyn Trait mean both "trait object" and "autogenerated sum type". There are proposals for using the dyn keyword in other ways, such as [x; dyn y] for an array allocated on the stack with a fixed but not-known-until-runtime size y, but that doesn't involve putting a trait name after dyn so it's clearly something different.


Regarding the rest of #2414 (comment), @khuey most of that looks identical to what I thought was the dominant proposal already; the only difference I can see is the use of dyn as the marker. Were there meant to be any other new/different suggestions in there, or just a summary of what we've come up with so far?


@Ekleog

So to sum up my opinion (and I'm more and more feeling like I'm alone in this opinion, so won't bother you much longer with it :)):

  • I think a marker, either at return-site or at return-type, is necessary

I think everyone agrees with this... or at least I really hope everyone does.

  • I think the return-type-marker approach has the inconvenient of not explicitly saying where types start being enum-ified, and as such will likely generate surprising behaviour that'll implicitly leak the internals of the compiler (eg. “typing occurs at the AST phase”), thus hindering learnability

Now this part truly baffles me, because this is exactly why I've been advocating the opposite: putting the marker on the types makes it pretty obvious where the enum-ified types are. They're where the markers are. Putting them anywhere else immediately makes it less than obvious. I'm... not sure what else could be said about this.

  • I think the return-site-marker approach has the advantage of being explicitly “from the marker up to the next type enforcement, the type is enum-ifiable with other enum-ifiable types”

I still don't see what the problem is with the surface syntax implying that these autogenerated enum are nested rather than flattened. Nested enums aren't evil. I don't think anything that's been proposed so far would prevent the compiler from flattening some autogenerated enums as an optimization (assuming that even makes a difference; I don't recall seeing any evidence that it would), and if you're using this feature in the first place you shouldn't care that much about the precise layout of the enums getting generated. When the precise layout is a big deal, just write the type by hand.

  • The return-type-marker approach is roughly as verbose as the return-site-marker approach as soon as we start considering non-trivial examples where everything is not just coming from a single match (I'd like to say “more verbose” but would likely be contradicted with an example I didn't think of)

I assume what you're referring to is the subset of your comment where you argue that a match arm with an if else expression would need to be transformed into a block with a let statement just so the marker could be applied to it. I do agree that this would be an ergonomic showstopper, but that seems only slightly worse to me than your proposal that we mark the return sites rather than the types, and I'm not seeing what's wrong with simply making the feature "work across the match-if example". I don't buy that "but then ... error reporting for the whole rust compiler becomes a mess when an actual typing error occurs" because error reporting with unnamed types is going to be a challenge no matter what rules we choose for the syntax, and I really don't think that challenge is intractable unless you have several layers of autogenerated enums within a single function, in which case your function is probably far too long anyway.

rust-lang/rust#48055 has been in the works for a while.

I don't see dyn in the unsized values rfc at all but maybe I'm missing something ...

Anyways, if Rust does eventually support unboxed, dynamically sized trait objects on the stack, isn't an auto-generated sum type just a Sized reification of that?

commented

Anyways, if Rust does eventually support unboxed, dynamically sized trait objects on the stack, isn't an auto-generated sum type just a Sized reification of that?

Probably not. The deep, fundamental difference between trait objects and enums is that the set of concrete types that a trait object might be wrapping is open, and not necessarily known to the compiler, while enums must have all variants known to the compiler. In theory, the surface syntax of trait objects could be compiled down to enums as an optimization if the compiler happened to know all the concrete types, though I have no idea if that would be practically useful. Is that optimization what you're trying to propose?

I don't see dyn in the unsized values rfc at all but maybe I'm missing something ...

I think the only reason dyn doesn't show up in that RFC is because it predates the use of dyn syntax for trait objects (judging by the "Alternatives" section, it also predates the suggestion of [x; dyn y] syntax for alloca'd arrays). But trait objects are unsized, so it would apply to them, unless I'm deeply misunderstanding something.

@Ixrec

I think the return-site-marker approach has the advantage of being explicitly “from the marker up to the next type enforcement, the type is enum-ifiable with other enum-ifiable types”

I still don't see what the problem is with the surface syntax implying that these autogenerated enum are nested rather than flattened. Nested enums aren't evil. I don't think anything that's been proposed so far would prevent the compiler from flattening some autogenerated enums as an optimization (assuming that even makes a difference; I don't recall seeing any evidence that it would), and if you're using this feature in the first place you shouldn't care that much about the precise layout of the enums getting generated. When the precise layout is a big deal, just write the type by hand.

I'm not seeing issues with nested enums (well, I am, but that's not the reason why I put it here because I know they can be fixed in other ways). The reason why I'm saying this here is to balance the previous point about learnability of the marker. (and I'm replying to your reply about it just below)

Now this part truly baffles me, because this is exactly why I've been advocating the opposite: putting the marker on the types makes it pretty obvious where the enum-ified types are. They're where the markers are. Putting them anywhere else immediately makes it less than obvious. I'm... not sure what else could be said about this.

I must be misunderstanding something in your learnability argument.

The main argument I'm trying to say is this:

let a: marker impl Trait = match{}

This variable a is a variable that will be an auto-generated enum. But from what? I can't know until I look down into the match.

And now, if I look into the match, and see this:

let a: marker impl Trait = match f() {
    A => if g() { foo() } else { bar() },
    B => quux(),
}

Then I need to know whether foo() and bar() are being enum-ified or not.

And the answer to this question leads to the issues about having to know the compiler internals I was trying to put forward towards the end of my reply to @Paluth:

  • If they are not being enum-ified, then it's a pain to write (see my example introducing another let binding just to fix that), and it would make more sense to put the marker on the match than on the return type anyway (which is indeed another possibility we forgot to consider)
  • If they are being enumified, there are two possibilities:
    • Either the rust compiler enum-ifies at all match/if/etc., this sounds stupid, as eg.
      let foo = if f() { bar() } else { baz() };
      function_expecting_bar(foo);
      would fail with an error at the function_expecting_bar call while it should have failed at the if
    • Or the rust compiler enum-ifies only the match/if/etc. that are downwards in the AST from the : marker impl Trait, and:
      • Some stuff start to make no sense:
        // Does not compile
        fn foo() -> marker impl Trait {
            let a = if foo() { bar() } else { baz() };
            a
        }
        // Compiles
        fn foo() -> marker impl Trait {
            if foo() { bar() } else { baz() };
        }
      • Some error messages become weird:
        fn foo() -> marker impl Trait {
            match f() {
                Foo => {
                    let bar = if x() { y() } else { z() };
                    function_expecting_Y(bar)
                }
                Bar => bar(),
            }
        }
        encounters the same issue as the function_expecting_bar above
      • Code refactoring becomes weird, as things no longer have the same meaning in the AST “below a return point” and everywhere else (see the last code example in my reply to @Paluth from my last comment for an example)

Anyway, this requires some additional knowledge of how the compiler works to know the answer to “from what point until the marker are types enum-ified?”

(BTW, at least in my Future-based use case for this, I'm returning from nested matchs and if/then/else, so I can say that it's a real-world use case)

On the other hand, the return-site approach has an easy answer to that: the types are enum-ified from the marker until the next point where a type is forced.

Let's discuss verbosity once this discussion about the exact semantics of the marker-at-return-type (and the drawbacks of it) is closed :)

@Ekleog One big problem I see with the "TypeInProgress" you're proposing is that its finalization is not explicitly marked. What happens if it should be passed as an argument to a function call instead of being returned?

fn foo() -> impl Trait {
    let a = if cond1() { marker f1() } else { marker f2() };
    f3(&a); // Used in function call, finalize type here?
}

To me it feels very different than the constraint based type inference that the compiler does today.

@MajorBreakfast Yes, I think it should finalize if passed to a function call, in most cases. Exactly like with integers, actually:

fn foo(_: usize) {}
fn bar<T>(_: T) {}
// ...
let a = 3;
foo(a);
// a has type `usize`
let b = 3;
bar(b);
// b still has type `{integer}`

See https://play.rust-lang.org/?gist=97ad7a96d24e85f8f097cd513dcda3c5&version=stable&mode=debug

In order to simplify the thing, I think for a first draft any constraint imposed should finalize the type, eg. the bar function above would not finalize, but fn baz<T: Debug>(_: T) would. This is not exactly the same as with integers (where the type would stay {integer}), but would likely be much simpler to implement.

That said, it would likely be possible to not finalize the enum but just add a constraint on it (ie. that it must be Debug). Actually, that's similar to what rustc does for integers, and it seems to even partially succeed at it:

trait MyTrait {}
impl MyTrait for usize {}

fn bar<T: MyTrait>(_: T) {}

fn main() {
    let a = 3;
    bar(a);
    // Here a has type `usize`
}

See https://play.rust-lang.org/?gist=2226c34c8b98ee8ac936b7190f779961&version=stable&mode=debug

But not perfectly:

trait MyTrait {}
impl MyTrait for usize {}
impl MyTrait for isize {}

trait MyTrait2 {}
impl MyTrait2 for usize {}
impl MyTrait2 for u32 {}

fn bar<T: MyTrait + MyTrait2>(_: T) {}

fn main() {
    let a = 3;
    bar(a);
    // Here `a` is still {integer}
}

See https://play.rust-lang.org/?gist=194af36fb8e18a53b3e8a1874975ec24&version=stable&mode=debug

(well, actually I guess rustc internally has enough information to enforce that a has type usize, but it's not displayed)

Basically, the marker at return-site would be approximately like a generalized {integer} from a compiler point of view, or so I think. :)

I'm observing that dyn Trait consists of a vtable pointer and a T: Trait @alexreg so it need not be a DST if we do not export Trait and thus only have a few fixed T: Trait.

At that point, we likely have an implicit conversion from a T: Trait to a dyn Trait anyways, so almost by necessity the syntax takes the form mentioned by @Ekleog

fn foo(a: bool) -> impl Debug {
    if a {
        7 as dyn Debug
    } else {
        "Foo"
    }
}

Avoiding this syntax would require tweaking trait objects, ala dyn 7 and dyn "Foo", which sounds unrealistic. We do have markers here in the as dyn Debug, but only at the type level, so not at each site.

There is one important caveat however: Is foo::Output : Sized? In principle yes. But what about dyn MySummand in

trait MySummand { }
impl MySummand for X {}
impl MySummand for Y {}
fn foo(a: bool) -> impl Debug+MySummand { ... }
fn foo(a: bool) -> impl Debug+MySummand { ... }

Again presumably yes but stuff could get weird.

@Ekleog The following code is from your comment here:

fn foo() -> impl Trait {
    let a = if cond1() { Ok(marker f1()) } else { f2().map(|x| marker x) };
    if cond2() { a.unwrap() } else { marker f3() }
}

(Edit: @Ekleog mentions in his next comment that he inadvertently introduced a little mistake in this code. I've now changed f3().map(|x| marker x) to just marker f3() to fix it.)

Why does neither map() nor the closure finalize marker x?

Anyways, if Rust does eventually support unboxed, dynamically sized trait objects on the stack, isn't an auto-generated sum type just a Sized reification of that?

Probably not. The deep, fundamental difference between trait objects and enums is that the set of concrete types that a trait object might be wrapping is open, and not necessarily known to the compiler, while enums must have all variants known to the compiler. In theory, the surface syntax of trait objects could be compiled down to enums as an optimization if the compiler happened to know all the concrete types. Is that what you're trying to propose?

I'm trying to salvage my proposal :)

I read and I think I understand the RFC now. AIUI, the intended use case for dyn Trait is roughly:

fn passed_by_value_without_boxing_or_monomorphization(f: dyn FnOnce()) {
  f()
}

let x = || {
  // Stuff that makes this FnOnce.
};

passed_by_value_without_boxing_or_monomorphization(x);

Ok, fair enough. Let's think out loud for a bit. So I can also do this:

fn foo(a: bool) {
  let x = || {
    // Stuff that makes this FnOnce.
  };

  let y = || {
    // Different stuff that makes this FnOnce.
  };

  if a {
    passed_by_value_without_boxing_or_monomorphization(x);
  } else {
    passed_by_value_without_boxing_or_monomorphization(y);
  }
}

But if I can do that, I should probably be able to do this too.

fn foo(a: bool) {
  let x = if a {
    || {
      // Stuff that makes this FnOnce.
    }
  } else {
    || {
      // Different stuff that makes this FnOnce.
    }
  };

  passed_by_value_without_boxing_or_monomorphization(x);
}

Presumably x here will need the same monomorphizing (type erasing? not entirely sure what to call this) annotation either on the let binding or the as blah syntax that is necessary for everything else. So that's really:

fn foo(a: bool) {
  let x: dyn FnOnce() = if a {
    || {
      // Stuff that makes this FnOnce.
    }
  } else {
    || {
      // Different stuff that makes this FnOnce.
    }
  };

  passed_by_value_without_boxing_or_monomorphization(x);
}

Notably, x is not Sized. It doesn't have a constant size known at compile time. But the compiler does know all possible types and sizes it can have. For the moment, let's call that set of knowledge AllVariantsKnown. Contrast that with f inside passed_by_value_without_boxing_or_monomorphization. That function can be called with any dyn FnOnce() trait object, so we don't know all variants there.

Constructing an auto-generated sum type now is equivalent to having a compiler builtin

fn build_auto_generated_sum_type<X: Trait + AllVariantsKnown>(x: X) -> impl Trait + Sized

So perhaps at the end my original example looks like

fn foo_agst(a: bool) -> impl ::std::fmt::Debug
{
    let b: dyn ::std::fmt::Debug = if a {
        7
    } else {
        "Foo"
    };

    build_auto_generated_sum_type(b)
}

(Obviously we can bikeshed what build_auto_generated_sum_type looks like :) )

Also note that we can solve things like @Ekleog's loop example by being conservative in what we mark as AllVariantsKnown.

@MajorBreakfast

fn foo() -> impl Trait {
    let a = if cond1() { Ok(marker f1()) } else { f2().map(|x| marker x) };
    if cond2() { a.unwrap() } else { f3().map(|x| marker x) }
}

Why does neither map() nor the closure finalize marker x?

Well, for the map and unwrap, these come under the case of the fn bar<T>(_: T) function from my previous message: they impose no bound on the type, thus do not finalize the enum.

Here, if I type-annotate, we'd have something like (hope it's understand-able):

trait Trait {}
fn f1() -> T1 {} impl Trait for T1 {}
fn f2() -> Result<T2, E> {} impl Trait for T2 {}
fn f3() -> T3 {} impl Trait for T3 {}

fn foo() -> impl Trait {
    let a = if cond1() {
        Ok((marker (f1(): T1)): TypeInProgress(T1)): Result<TypeInProgress(T1), _>
    } else {
        (f2(): Result<T2, E>).map(
            (|x: T2| {
                (marker (x: T2)): TypeInProgress(T2)
            }): Fn(T2) -> TypeInProgress(T2)
        ): Result<TypeInProgress(T2), E>
        // Here, map() takes a Fn(T2) -> Whatever and returns Result<Whatever, E>,
        // without any bounds on Whatever, so we're saved by the
        // no-bound-means-no-finalization rule
    }: Result<TypeInProgress(T1&T2), E>;
    if cond2() {
        (a: Result<TypeInProgress(T1&T2), E>).unwrap(): TypeInProgress(T1&T2)
        // Same here, unwrap() imposes no bound whatsoever on the type, so TypeInProgress stays
    } else {
        (marker (f3(): T3)): TypeInProgress(T3)
    }: TypeInProgress(T1&T2&T3)
}

(Actually I've changed the code around f3 a bit, as it made no sense and I had misread your example)

For the closure, it's a harder question. I think that, as the closure has no return type, then it should be typed following the no-bound-means-no-finalization rule. Actually, that's also what rustc already does with {integer}:

fn main() {
    let c = || 3; // Fn() -> {integer} ; even if I can't get rust to display this type
    let () = c(); // expected type `{integer}`, found type `()`
}

See https://play.rust-lang.org/?gist=e55518d52ce9154962259e970f266754&version=stable&mode=debug

So I don't think these should be big issues 🙂

@Ekleog Another question: You've inserted the marker into the Result by calling the map method. What if such a method does not exist?


@Ekleog You've previously mentioned that the marker-at-return-type syntax has a problem with nested if or match expressions:

let a: marker impl Trait = match foo() {
    Foo1 => if bar() { baz() } else { quux() },
    Foo2 => iwantmorenames(),
}

It's good that mention this case. I, however, do not agree with your conclusion. Even today the compiler ensures that all three are of the same type. This means the compiler is able to handle nested expressions. It follows that it can be made smart enough to create an enum with the appropriate amount of variants. No additional markers needed.

@MajorBreakfast

First, the answer about if-nested-in-match, as it's shorter: See here for a better explanation of why the marker-at-return-type syntax is a problem for syntactic reasons, and my reply to @Paluth from here for why I use the AST as a basic building block :) (this blog post also appears to confirm that typing is done at HIR ~= AST level currently)

Now, the “what if there is no map method?” question. Actually, I'd argue it's better not to be able to auto-enumify in this situation, and marker-at-return-site handles this situation better.

For instance, let's consider Vec (that turns out to have a map equivalent by .iter().map().collect()).

With marker-at-return-type, I'd write things like:

fn f1() -> Vec<A> {}
fn f2() -> Vec<B> {}

fn foo(x: bool) -> Vec<marker impl Trait> {
    if x { f1() } else { f2() }
}

I… do not want this to compile. Because rustc has no way to know how to properly map the auto-generated enum into the Vec: it requires re-allocation because the size will change, etc. Same thing for HashMap, where especially marker-ing the key would in addition require re-keying as the hash would change. For Ref, it's just not possible to wrap the inner type in an enum without also owning the RefCell.

So this argument would actually almost rule out marker-at-return-type (well, it could be specified to not handle cases where the type is hidden in an *, but I'm not sure that'd solve all the issues, and then mapping the marker into it becomes even more painful than with marker-at-return-site as you have to type-annotate your mapping function).

On the other hand, marker-at-return-site would handle it pretty well, using the .iter().map().collect() idiom, which is the lowest-possible-overhead way to say the Rust compiler how to map the enum over the Vec.

Also, if there is no .map()-like method on the type, then it means that the type is not meant to be mapped on. And so allowing to implicitly map an enum into such a type I got from a function sounds like a Bad Idea™ to me, and if it's not a type I got from a function, I could just do like the Ok(marker foo()), ie. add the marker at object creation time :) (that said I may be missing some cases for which implicit enum-ization without a map()-like function would make sense)

On the other hand, marker-at-return-site would handle it pretty well, using the .iter().map().collect() idiom

Can you give an example?


About learnability and readabiliy: We disagree there. You say that it's hard to see what types are enumified because there are no markers at the return points. I, however, don't agree with that assessment because in normal if and match expressions without enumification returning values looks exactly the same. Learnability of marker-at-return-type is IMO better because we can clearly see where the enumification happens.

@MajorBreakfast

Sure:

fn f1() -> Vec<A> {}
fn f2() -> Vec<B> {}

fn foo() -> Vec<impl Trait> {
    if bar() {
        f1().into_iter().map(|x| marker x).collect()
        // works because neither map() nor collect() impose any bound on the TypeInProgress type
    } else {
        f2().into_iter().map(|x| marker x).collect()
    }
}

This is incidentally exactly the same syntax when going from Vec<Box<T>> to Vec<Box<Trait>>.
BTW, @Nadrieril proposed forward on IRC that marker x be written x as enum impl Trait or x as enum _ so that this parallel becomes obvious. (let's try not to discuss that right now, it was just an on-the-fly comment)

How would you handle such an enum-ification with marker-at-return-type?


About learnability and readability, can I just know which of the three options I gave in #2414 (comment) for marker-at-return-type you consider easy-to-learn? Or maybe another one I haven't thought of?

How would you handle such an enum-ification with marker-at-return-type?

Not sure if it can be done. Your notation uses .map(|x| marker x) to convert from the vector item type into to sum type. Seems like a clean way to do this. I can't think of a way to integrate this step into the marker-at-return-type system.

I'm beginning to like this solution. I'd be really interested if someone of the compiler team could comment on whether the "TypeInProgress" system is technically possible.

BTW the name "TypeInProgress" kinda suggests that the type could dynamically change. But, this is not the case, right?. It's a static type known at compile type. Maybe there's a better name for it.