Proposal to make go_marshal API safer

Question

Proposal to make go_marshal API safer

ayushr2 opened this issue 3 years ago · comments

Description

Strict Preconditions in Safe Methods

Currently the marshal.Marshallable interface has pretty strong preconditions even for the "safe" versions of the marshal or unmarshal methods. In gVisor codebase, it is common to have to unmarshal bytes coming from a potentially malicious party or even from a trusted yet potentially compromised party. In that case, the auto-generated implementation would panic when those preconditions are not met. Users might want to gracefully handle these situations and not panic.

One possibility for statically sized types is to:

var s someMarshallable
if len(src) < s.SizeBytes() {
   return error
}
s.UnmarshalUnsafe(src)

But this has 2 flaws:

Having to manually add these checks is an inconvenience for the user. It is also error prone and can lead to inconsistent checks across the codebase.
It is inherently infeasible for dynamically sized types because SizeBytes() is only known during Unmarshal.

Error prone buffer shifting logic

These are especially present in manual implementations. Users end up with something like:

// UnmarshalBytes implements marshal.Marshallable.UnmarshalBytes.
func (s *someMarshallable) UnmarshalBytes(src []byte) {
	s.A.UnmarshalUnsafe(src)
	src = src[s.A.SizeBytes():]
	s.B.UnmarshalUnsafe(src)
	src = src[s.B.SizeBytes():]
	...
}

These are error prone and also an inconvenience to users.

Is this feature related to a specific bug?

#5465

Do you have a specific solution in mind?

The proposal is to modify the API in the following way:

Change safe method signatures to MarshalBytes(dst []byte) ([]byte, bool) and UnmarshalBytes(src []byte) ([]byte, bool) and drop the preconditions.
- The returned bool indicates if the operation was successful. For gVisor's usecases, this bool is not really needed for MarshalBytes(). The bool is mainly useful for UnmarshalBytes() because while marshalling we have more control over what is being marshalled and also for dynamic types, SizeBytes() is valid while marshalling. However, IMO we should still change MarshalBytes() too for consistency.
- The returned byte slice is the buffer shifted based on the type's size. Following operations can directly pass that into the Marshallable interface.
Change unsafe method signatures to MarshalUnsafe(dst []byte) []byte and UnmarshalUnsafe(src []byte) []byte.
- Keep the precondition and it is more suited for an "unsafe method". This is the more frequently used method in hot paths and would be infinitesimally faster to avoid the size check.
- The returned buffer should be used the same way as mentioned above.
The slice API should also return the shifted buffer likewise.

Ayush Ranjan · Answer 1 · Fri Aug 13 2021 02:36:49 GMT+0800 (China Standard Time)

cc @mrahatm

Ayush Ranjan · Answer 2 · Fri Aug 13 2021 02:43:58 GMT+0800 (China Standard Time)

Actually, we can also make the same amends for the unsafe methods as for the safe methods. The generated implementation anyways queries the size of the type. It would be trivial adding a check and adding value to the API.

Rahat Mahmood · Answer 3 · Fri Aug 13 2021 04:50:39 GMT+0800 (China Standard Time)

Gomarshal was never designed to do some of things mentioned in this proposal:

Handle unchecked user input. Gomarshal usually handles syscall arguments, where the memory it marshals to/from is user controlled, and passing a bad buffer results in EFAULT from the CopyIn/Out methods.
Manual implementations of the Marshallable interface was intended as a rare fallback mechanism when automated generation wasn't feasible. The intent was to never have to write error prone buffer shifting code.
The marshalling intentionally doesn't do any bound checking because bounds known/managed by the owner of the buffer passed to the marshal methods. In many cases, doing a bound check inside the marshalling methods would be a duplicate check.

I think we should think of Marshal{Bytes,Unsafe}/Unmarshal{Bytes,Unsafe} as low level marshaling methods and if we want to add bound checks (i.e. dynamic types), we can wrap them in helper methods. This is how the auto-generated CopyIn/CopyOut methods work. The error handling is done when copying the buffers 0, and by the time it calls the marshal methods we're guaranteed to have a sane buffer.

Ayush Ranjan · Answer 4 · Fri Aug 13 2021 08:26:41 GMT+0800 (China Standard Time)

I think we should think of Marshal{Bytes,Unsafe}/Unmarshal{Bytes,Unsafe} as low level marshaling methods and if we want to add bound checks (i.e. dynamic types), we can wrap them in helper methods.

Should these wrappers also be autogenerated? Or do you mean we should provide generic package methods do such things?
I am inclined towards the prior for the following reason: a generic package method would have a marshal.Marshallable receiver which would lead to an implicit cast from a concrete type to an interface. As of right now, this unconditionally leads to an allocation.

Questions:

So this proposal should be amended to add a new method to marshal.Marshallable - CheckedMarshal(dst []byte) ([]byte, bool)? (I dislike the name, do you have a better suggestion?)
Should this method be added conditionally based on some annotation like // +marshal checkbound?
Is it reasonable that the wrapper always calls into the unsafe version (MarshalUnsafe/UnmarshalUnsafe)? Because afaik, the interface requires that the unsafe methods fallback to the safe methods if the type is not packed. So we get better performance when ever possible.

Rahat Mahmood · Answer 5 · Sat Aug 14 2021 01:48:28 GMT+0800 (China Standard Time)

I think we should think of Marshal{Bytes,Unsafe}/Unmarshal{Bytes,Unsafe} as low level marshaling methods and if we want to add bound checks (i.e. dynamic types), we can wrap them in helper methods.

Should these wrappers also be autogenerated? Or do you mean we should provide generic package methods do such things?
I am inclined towards the prior for the following reason: a generic package method would have a marshal.Marshallable receiver which would lead to an implicit cast from a concrete type to an interface. As of right now, this unconditionally leads to an allocation.

Autogenerating the wrappers sound reasonable.

Questions:

So this proposal should be amended to add a new method to marshal.Marshallable - CheckedMarshal(dst []byte) ([]byte, bool)? (I dislike the name, do you have a better suggestion?)

I don't have a better suggestion but Checked{Un,}Marshal sounds fine. There's precedence for using "*Checked" and "*Unchecked" to refer to bound checks in segment set methods.

Should this method be added conditionally based on some annotation like // +marshal checkbound?

I think the marshallable interface is easier to understand when things are unconditionally generated: you get exactly what you see in the interface declaration. The slice API is guarded behind an annotation because it isn't used by the vast majority of marshallable types.

We should favour unconditional generation, unless it ends up generating a lot of code that is rarely called.

Ideally we should also add handling for it in //+marshal dynamic code generation so we aren't adding yet another interface method someone has to write by hand for dynamic types.

Is it reasonable that the wrapper always calls into the unsafe version (MarshalUnsafe/UnmarshalUnsafe)? Because afaik, the interface requires that the unsafe methods fallback to the safe methods if the type is not packed. So we get better performance when ever possible.

We should always call into the unsafe version and let it handle the fallback. We don't need two variants of the checked method, just one that calls into the unsafe API.

Ayush Ranjan · Answer 6 · Wed Aug 25 2021 06:57:38 GMT+0800 (China Standard Time)

Ideally we should also add handling for it in //+marshal dynamic code generation so we aren't adding yet another interface method someone has to write by hand for dynamic types.

I don't think it is possible to autogenerate CheckedUnmarshal for dynamic types. Because the bound checks have to happen while unmarshalling itself.