Definition of memory allocation for array and struct & improved initialization of variables

Question

Definition of memory allocation for array and struct & improved initialization of variables

avitkauskas opened this issue 5 years ago · comments

Alvydas Vitkauskas commented 5 years ago

Note: This may be not very well defined RFC.
Taking a chance while the template is not provided yet :)

Location in memory

We want some constructs to be located on the stack and others located on the heap.
We want to have control. Language should make some guarantees about that.

Some languages have array and vector (sometimes also a slice) to distinguish between these two.

array is always allocated on the stack and vector is usually allocated on the heap (unless compiler decides to optimise? I'm not sure).
array is fixed size and cannot grow, vector can grow dynamically.

We usually want struct allocated on the stack. But sometimes we also want it on the heap.

Proposal for V syntax

struct is usually allocated on the stack
now you can get it allocated on the heap when declaring a variable:

stuct S {
    x int
}
s := &S{}

& also indicates a reference in V and is a bit confusing here.
What if we use @ instead to denote the heap allocation?

s := @S{}

This would also allow us to have the following syntax for array and vector:

array - fixed size, located on the stack

a := [0].repeat(3)
a := [1, 2, 3]

vector - can grow, located on the heap

a := @[0].repeat(3)
a := @[1, 2, 3]
a := @[] // empty vector on the heap

This contradicts with current syntax when now we only have vector documented in the language (though called array in documention), so this should change. Contra-argument here could be that vectors are much more popular, so their syntax should be simpler, but then it does not match the memory allocation idea with the struct (most stucts we want on the stack and most arrays we want of the heap?)

This syntax allows us to avoid constructs like a := [3]int(1, 2, 3) or even worse a := [3]int which compiles now but leaves a uninitialized.

The idea of enforcing developer to provide the initial value for every variable is nice. But this is not strictly followed with struct. As now we do:

struct S {
    x int
    y int
    n string
}
s := S{n: 'whatever'}

and struct members x and y are initialized with the default values (decided by the compiler, not the developer). This is not consistent. If structure members can be initialized by default values, then why not stand-alone variables? Then why we cannot do x := int and let it be initialized to 0 as it is done in the struct?

To be consistent, we should be declaring the struct as follows:

struct S {
    x := 0
    y := 1
    n := ''
}

Then it would make perfect sense and all default values would be explicitly provided by developer.

But the current struct syntax looks more clear, you can see the types of the members more easily. Though, if we are comfortable to understand the types of all variables from their initial values, so we should also be comfortable to see the types of the struct members from the initial values.

Proposal for compromise

What if we join explicit types with optional initial values and clearly defined default initial values?

We would get the full definition syntax like that:

a : int = 0

with type or initial value optional.
If you omit the type, you provide your initial value:

a := 1

It you omit the value, you get clearly defined default value:

a : int
assert(a == 0)

Then you could define stuct with the default values like that:

struct S {
    x : int
    y : int
    n : string
}

It would also be consistent with the usage of : when initializing a variable:

s := S{y: 2}

We could also allow this:

s1 := S{}
s2 := s1{y:2}

I don't know how it relates to "one way of doing things", but overall this approach is very clear and consistent.

This would also urge us to clearly define default values for all the types:

Default values

bool -> false
string -> ''

i8    i16  int  i64  i128
byte  u16  u32  u64  u128
-> 0

rune // represents a Unicode code point  
-> ` ` // How shold we write Unicode literals in V
// what should be a default value? u\0000?
// we do not have `char`, shoud we? is `byte` for that?

f32 f64 -> 0.0

byteptr -> null // should it be allowed?
voidptr -> null // should it be allowed?
// why we have 2 types of pointers?
// should it be just `ptr` and the exact type infered from initial value?
// like: `p : &byte = &'some string'`

If we say V is safe language, then can we allow null? But if we do not allow it, how should we live without it? Definitelly using Option, but this should also be looked into seriously. Option handling today still faces some questions.

Delyan Angelov · Answer 1 · Mon Oct 07 2019 01:32:37 GMT+0800 (China Standard Time)

// why we have 2 types of pointers?
I think the answer for this (byteptr and voidptr) is easy interoperability with existing C code.

Delyan Angelov · Answer 2 · Mon Oct 07 2019 01:44:21 GMT+0800 (China Standard Time)

The language already implements so called fixed arrays, for example
a := [5]int
will get a allocated on the stack, and it will not be able to grow dynamically. Currently, the elements of such fixed array a are not initialized (but probably should) to 0.

In the future, you would be able to do:
(as @medvednikov commented below vlang/v#2241)
a := [3]int([1,2,3])
... which will also allocate a on the stack, and initialize its elements with the numbers 1 2 3.

The current syntax for this fixed array initialization is:
a := [1, 2, 3]!!

Delyan Angelov · Answer 3 · Mon Oct 07 2019 01:50:14 GMT+0800 (China Standard Time)

Other than the uninitialized yet fixed arrays, v variables already do get initialized to their default values, which are 0 (their implementation is such, that filling with zeros the memory of a struct or an array or a string for example 'initializes' everything correctly).

Alvydas Vitkauskas · Answer 4 · Mon Oct 07 2019 02:24:18 GMT+0800 (China Standard Time)

The language already implements so called fixed arrays, for example
a := [5]int
will get a allocated on the stack, and it will not be able to grow dynamically. Currently, the elements of such fixed array a are not initialized (but probably should) to 0.

In the future, you would be able to do:
(as @medvednikov commented below vlang/v#2241)
a := [3]int([1,2,3])
... which will also allocate a on the stack, and initialize its elements with the numbers 1 2 3.

The current syntax for this fixed array initialization is:
a := [1, 2, 3]!!

I know that, and that's exactly what I am not happy about:
1.- We have two very different syntaxes:

a := [1, 2, 3] // dynamically allocated on the heap

and

a := [3]int([1,2,3]) // fixed size, on the stack, and quite ugly :(

2.- Also, the syntax for fixed size is inconsitent with other definitions: from the docs "Array type is determined by the first element: [1, 2, 3] is an array of ints ([]int)." All the types are inferred from the values, except fixed arrays? Why do we need int in the fixed array definition? That's why I propose a := @[1, 2, 3] - it's consistent with dynamic arrays.

Only I propose to switch it ([1, 2, 3] - fixed, @[1, 2, 3] - dynamic - and it's only to be consistent with struct, as for the struct we usually want in on the stack.

Alvydas Vitkauskas · Answer 5 · Mon Oct 07 2019 02:35:04 GMT+0800 (China Standard Time)

// why we have 2 types of pointers?
I think the answer for this (byteptr and voidptr) is easy interoperability with existing C code.

Yes, then we could have:

p := null // voidptr in C, same as just ptr in V
p : ptr    // same as obove
p : &byte // byteptr initialized with default `null`
a := 'abc'
p := &a  // byteptr with the address of `a`
b := 0.5
p := &b // &f32, same as *float in C
p : &f64  // *double

Wouldn't this allow to have all type of pointers you need, both for V and for C interop?

Alexander Medvednikov · Answer 6 · Mon Oct 07 2019 13:23:23 GMT+0800 (China Standard Time)

Thanks @avitkauskas

1.- We have two very different syntaxes:

a := [3]int([1,2,3]) is just a := T(val), like any declaration. We simply specify that it's a fixed size array.

null is for C code only.

I'll cover the rest of the points after I wake up :)

gslicer · Answer 7 · Fri Oct 25 2019 18:51:05 GMT+0800 (China Standard Time)

I'll cover the rest of the points after I wake up :)
🥇