Is `.base` permitted as a struct member? Is it the same as `.r#base`?

Question

Is `.base` permitted as a struct member? Is it the same as `.r#base`?

zygoloid opened this issue a year ago · comments

Summary of issue:

We support initializing the base class of a derived class using .base:

base class A { var a: i32; }
class B {
  extend base: A;
  var b: i32;
}
var b: B = {.base = {.a = 1}, .b = 2};

But what does it mean to use the keyword base as a field name in a struct?

Details:

Some options:

.base = is special syntax that can only be used in a struct literal that is converted to a class.

// Error, struct value cannot have field named `base`.
var a: auto = {.base = 5};

This means that we can't forward such values into a function that will construct an instance of a class:

fn F[U:! type](T:! type where U is ImplicitAs(T), x: U) -> T { return x as T; }
// Error!
var b: B = F(B, {.base = {.a = 1}, .b = 2});

.base means the same thing as .r#base -- base is treated as a non-keyword in this context, despite being a keyword in other contexts. This would be problematic for interop with C++ classes with a base member:

var it: Cpp.std.reverse_iterator<It>;
// Ambiguous: is this the base class of the iterator, or is it the `base` member function?
var a = it.base;

.base is a different name from .r#base, and structs can have a field with that special name. That field is not special in a struct, it's just another field name.

// OK, two fields with different names.
var s: {.base: i32, .r#base = f64} = {.base = 5, .r#base = 7.0};
// Error, field notakeyword specified multiple times.
var t: auto = {.notakeyword = 1, .r#notakeyword = 2};

.base is a different name from .r#base, and structs can have a field with that special name. A struct implicitly extends its base field, if present, just like a class explicitly extends it.

var xy: {.x: i32, .y: i32} = {.x = 1, .y = 2}
// OK, xyz is of type {.base: {.x: i32, .y: i32}, .z: i32}
var xyz: auto = {.base = xy, .z = 3};
// xyz is {.base = {.x = 1, .y = 2}, .z = 3}
// OK, found in base.
var x = xyz.x;

This makes the struct behave a bit more like the class that it converts into.

We can extend (4) to get a struct update syntax:

// We can allow flattening if all field names match, when the destination is a struct with no `.base`.
var xyz: {.x: i32, .y: i32, .z: i32} = {.base = xy, .z = 3};
// xyz is {.x = 1, .y = 2, .z = 3}

// If we allow there to be unused fields in the source (perhaps only if they're shadowed),
// we can then perform struct updates via `.base` too.
var new_xyz: {.x: i32, .y: i32, .z: i32} = {.base = xyz, .y = 4};
// new_xyz is {.x = 1, .y = 4, .z = 3}

However, it's not clear to me how this would work if the destination also has a .base. One possible approach would be to initialize the base-most destination from the base-most source, and so on until we reach a level where only the source is still a base, and then flatten. (And reject if the destination has deeper inheritance than the source.) If we want this kind of flattening, perhaps a different syntax would be preferable:

// (Placeholder syntax.)
var xyz: auto = xy with {.z = 3};
// xyz is {.x = 1, .y = 2, .z = 3}

var new_xyz: auto = xyz with {.y = 4};
// new_xyz is {.x = 1, .y = 4, .z = 3}

Any other information that you want to share?

No response

josh11b · Answer 1 · Thu Nov 16 2023 00:03:30 GMT+0800 (China Standard Time)

Options 4 and 5 are the most tempting to me

Jon Ross-Perkins · Answer 2 · Thu Nov 16 2023 01:40:48 GMT+0800 (China Standard Time)

Option 4 seems good.

Regarding option 5, a few things I'd suggest considering before adopting the noted implicit flattening/nesting conversions via struct update syntax:

For background, considering C++, I think this is a change in behavior which may lead to different results. .base gives a syntax to update the parent without confusion about whether the field is on the child, which is not in C++ and would likely be helpful; still, there's a distinction between base and child structs. Maybe there's some way to make struct initialization work, but the naive way doesn't:
```
struct A { int x; };
struct B : public A { int y; };
// Error: field designator 'x' does not refer to any field in type 'B'
B c = {.x = 0, .y = 1};

struct D : public A { int x; int y; };
// Valid C++ code.
D e = {.x = 0, .y = 1};
```
https://cpp.compiler-explorer.com/z/evT88P8qT
In most cases of name ambiguity, I would expect ambiguities to be resolved by adding qualifiers. In this case though, ambiguities are resolved by specifying more things: the original designated name remains. This feels a little unusual to me.
```
// An error due to ambiguity in the `.x` initialization?
var a: {.base: {.x: i32}, .x: i32} = {.x = 0};
// Now unambiguous.
var b: {.base: {.x: i32}, .x: i32} = {.base = {.x = 0}, .x = 1};
```

What about classes and tuples? In classes, making base optional seems like it would be similar, although it may raise concerns about type safety. In tuples, I think it'd already been discussed and the decision was to treat nesting as requiring explicit flattening, not an implicit conversion.

For example:

// The struct literal example:
var a_struct: {.base = {x: i32, y: i32}, z: i32} = {.x = 1, .y = 2, .z = 3};
var b_struct: {.x: i32, .y: i32, .z: i32} = a_struct;

// Similar in classes:
class BaseT { var x: i32; var y: i32; }
class ChildT { extend base: BaseT; var z: i32; }
var a_class: ChildT = {.x = 1, .y = 2, .z = 3};
var b_class: {.x: i32, .y: i32, .z: i32} = a_class;

class FlatT { var x: i32; var y: i32; var z: i32; }
var c_class: FlatT = a_class;

// Similar in tuples:
var a_tuple: ((i32, i32), i32) = (1, 2, 3);
var b_tuple: (i32, i32, i32) = a_tuple;

Regarding the with keyword in var xyz: auto = xy with {.z = 3};, I think that is more explicit, and so doesn't have the same concerns as implicit casts, although name ambiguities may still be odd to resolve [e.g. var a: {{.base: {.x: i32}, .x: i32} = val with {.base = {}, .x = 3};].

josh11b · Answer 3 · Thu Nov 16 2023 03:18:38 GMT+0800 (China Standard Time)

I think we are only really supporting struct -> class conversions, not the other direction.

Chandler Carruth · Answer 4 · Fri Dec 22 2023 10:47:14 GMT+0800 (China Standard Time)

FWIW, I also like option 4.

Options 1-3 seem likely to end up with some amount of friction, whereas option 4 seems to really elegantly express the range of things we want in struct literals and provide important functionality for initializing base classes.

We can always revisit option 5 if/when we have motivation and some ways to deal with both the issues raised in the original description and by Jon.

Richard Smith · Answer 5 · Mon Dec 25 2023 04:37:26 GMT+0800 (China Standard Time)

I think option 4 is my preference here -- it gives a consistent behavior for the name base in classes and structs, seems (at least a little) useful for structs independent of the class initialization use case, and fits nicely into the class initialization use case.

Option 5 seems a little too implicit and do-what-I-mean-ish to me, and a more explicit struct update syntax seems like a better fit for Carbon's design aesthetic.

Chandler Carruth · Answer 6 · Mon Dec 25 2023 04:43:32 GMT+0800 (China Standard Time)

Let's call this decided with option 4 -- seems we have enough consensus among leads and no strong objections. And the comments above have lots of good points of rationale, including from leads.