carbon-language / carbon-lang

Carbon Language's main repository: documents, design, implementation, and related tools. (NOTE: Carbon Language is experimental; see README)

Home Page:http://docs.carbon-lang.dev/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is `.base` permitted as a struct member? Is it the same as `.r#base`?

zygoloid opened this issue · comments

Summary of issue:

We support initializing the base class of a derived class using .base:

base class A { var a: i32; }
class B {
  extend base: A;
  var b: i32;
}
var b: B = {.base = {.a = 1}, .b = 2};

But what does it mean to use the keyword base as a field name in a struct?

Details:

Some options:

  1. .base = is special syntax that can only be used in a struct literal that is converted to a class.
// Error, struct value cannot have field named `base`.
var a: auto = {.base = 5};

This means that we can't forward such values into a function that will construct an instance of a class:

fn F[U:! type](T:! type where U is ImplicitAs(T), x: U) -> T { return x as T; }
// Error!
var b: B = F(B, {.base = {.a = 1}, .b = 2});
  1. .base means the same thing as .r#base -- base is treated as a non-keyword in this context, despite being a keyword in other contexts. This would be problematic for interop with C++ classes with a base member:
var it: Cpp.std.reverse_iterator<It>;
// Ambiguous: is this the base class of the iterator, or is it the `base` member function?
var a = it.base;
  1. .base is a different name from .r#base, and structs can have a field with that special name. That field is not special in a struct, it's just another field name.
// OK, two fields with different names.
var s: {.base: i32, .r#base = f64} = {.base = 5, .r#base = 7.0};
// Error, field notakeyword specified multiple times.
var t: auto = {.notakeyword = 1, .r#notakeyword = 2};
  1. .base is a different name from .r#base, and structs can have a field with that special name. A struct implicitly extends its base field, if present, just like a class explicitly extends it.
var xy: {.x: i32, .y: i32} = {.x = 1, .y = 2}
// OK, xyz is of type {.base: {.x: i32, .y: i32}, .z: i32}
var xyz: auto = {.base = xy, .z = 3};
// xyz is {.base = {.x = 1, .y = 2}, .z = 3}
// OK, found in base.
var x = xyz.x;

This makes the struct behave a bit more like the class that it converts into.

  1. We can extend (4) to get a struct update syntax:
// We can allow flattening if all field names match, when the destination is a struct with no `.base`.
var xyz: {.x: i32, .y: i32, .z: i32} = {.base = xy, .z = 3};
// xyz is {.x = 1, .y = 2, .z = 3}

// If we allow there to be unused fields in the source (perhaps only if they're shadowed),
// we can then perform struct updates via `.base` too.
var new_xyz: {.x: i32, .y: i32, .z: i32} = {.base = xyz, .y = 4};
// new_xyz is {.x = 1, .y = 4, .z = 3}

However, it's not clear to me how this would work if the destination also has a .base. One possible approach would be to initialize the base-most destination from the base-most source, and so on until we reach a level where only the source is still a base, and then flatten. (And reject if the destination has deeper inheritance than the source.) If we want this kind of flattening, perhaps a different syntax would be preferable:

// (Placeholder syntax.)
var xyz: auto = xy with {.z = 3};
// xyz is {.x = 1, .y = 2, .z = 3}

var new_xyz: auto = xyz with {.y = 4};
// new_xyz is {.x = 1, .y = 4, .z = 3}

Any other information that you want to share?

No response

Options 4 and 5 are the most tempting to me

Option 4 seems good.

Regarding option 5, a few things I'd suggest considering before adopting the noted implicit flattening/nesting conversions via struct update syntax:

  1. For background, considering C++, I think this is a change in behavior which may lead to different results. .base gives a syntax to update the parent without confusion about whether the field is on the child, which is not in C++ and would likely be helpful; still, there's a distinction between base and child structs. Maybe there's some way to make struct initialization work, but the naive way doesn't:

    struct A { int x; };
    struct B : public A { int y; };
    // Error: field designator 'x' does not refer to any field in type 'B'
    B c = {.x = 0, .y = 1};
    
    struct D : public A { int x; int y; };
    // Valid C++ code.
    D e = {.x = 0, .y = 1};
    

    https://cpp.compiler-explorer.com/z/evT88P8qT

  2. In most cases of name ambiguity, I would expect ambiguities to be resolved by adding qualifiers. In this case though, ambiguities are resolved by specifying more things: the original designated name remains. This feels a little unusual to me.

    // An error due to ambiguity in the `.x` initialization?
    var a: {.base: {.x: i32}, .x: i32} = {.x = 0};
    // Now unambiguous.
    var b: {.base: {.x: i32}, .x: i32} = {.base = {.x = 0}, .x = 1};
    
  3. What about classes and tuples? In classes, making base optional seems like it would be similar, although it may raise concerns about type safety. In tuples, I think it'd already been discussed and the decision was to treat nesting as requiring explicit flattening, not an implicit conversion.

    For example:

    // The struct literal example:
    var a_struct: {.base = {x: i32, y: i32}, z: i32} = {.x = 1, .y = 2, .z = 3};
    var b_struct: {.x: i32, .y: i32, .z: i32} = a_struct;
    
    // Similar in classes:
    class BaseT { var x: i32; var y: i32; }
    class ChildT { extend base: BaseT; var z: i32; }
    var a_class: ChildT = {.x = 1, .y = 2, .z = 3};
    var b_class: {.x: i32, .y: i32, .z: i32} = a_class;
    
    class FlatT { var x: i32; var y: i32; var z: i32; }
    var c_class: FlatT = a_class;
    
    // Similar in tuples:
    var a_tuple: ((i32, i32), i32) = (1, 2, 3);
    var b_tuple: (i32, i32, i32) = a_tuple;
    

Regarding the with keyword in var xyz: auto = xy with {.z = 3};, I think that is more explicit, and so doesn't have the same concerns as implicit casts, although name ambiguities may still be odd to resolve [e.g. var a: {{.base: {.x: i32}, .x: i32} = val with {.base = {}, .x = 3};].

I think we are only really supporting struct -> class conversions, not the other direction.

FWIW, I also like option 4.

Options 1-3 seem likely to end up with some amount of friction, whereas option 4 seems to really elegantly express the range of things we want in struct literals and provide important functionality for initializing base classes.

We can always revisit option 5 if/when we have motivation and some ways to deal with both the issues raised in the original description and by Jon.

I think option 4 is my preference here -- it gives a consistent behavior for the name base in classes and structs, seems (at least a little) useful for structs independent of the class initialization use case, and fits nicely into the class initialization use case.

Option 5 seems a little too implicit and do-what-I-mean-ish to me, and a more explicit struct update syntax seems like a better fit for Carbon's design aesthetic.

Let's call this decided with option 4 -- seems we have enough consensus among leads and no strong objections. And the comments above have lots of good points of rationale, including from leads.