proto: Problem differentiating not-set value from default value
mxinden opened this issue · comments
Problem
The OpenMetrics Protobuf specification is using the proto3
syntax.
With the proto3
syntax all fields are optional by default. This is not to be confused with null-able. While one can choose not to specify a field it is then represented by its default value. Someone receiving a proto3
encoded payload has no way to tell the difference between a field explicitly set to its default value or not set at all.
The most prominent case where this can be problematic is for boolean fields where both the default value and the not-set value is false
. In the context of OpenMetrics, as far as I can tell, one problematic example would be the examplar
field in the CounterValue
message. A use can not tell the difference between a not-set examplar and an examplar set to 0
.
OpenMetrics/proto/openmetrics_data_model.proto
Lines 109 to 123 in 1ff04f2
Solution
To solve the above I see 3 solutions:
- The
optional
keyword, which would allow differentiating between a default value and a not-set value, was initially removed with theproto3
syntax, but has been revived with v3.12.0. In case requiring protocv3.12.0
is fine, one could just make use of the revivedoptional
keyword. - The
optional
keyword is just syntactic sugar foroneof
. Thus, in case requiring protocv3.12.0
is not an option, one can useoneof
directly. See this stackoverflow answer. - Use
proto2
instead ofproto3
(see original Prometheus proto definition).
Further reading
- General Stackoverflow discussing
optional
inproto3
. - Protoc release introducing
optional
toproto3
. - Documentation on field presence.
What do people think? Do you see alternative approaches?
Given that I don't have much expertise with Protobuf I might be missing something. If so, I would appreciate an answer pointing out my mistake.
Yeah, that's definitely a problem in general. For the exemplar, only the rare case of an exemplar with a value of 0 and without any labels and timestamp (or a default timestamp, see below) would be indistinguishable from a non-existing exemplar, but it's still a corner case that might strike eventually.
IIUC, the google.protobuf.Timestamp
type has a default value of 1970-01-01, midnight UTC, which means that all the optional timestamps are indistinguishable from that timestamp, which is a perfectly relevant timestamp for practical purposes. (The Go Time
type, for example, has a zero value 0001-01-01, midnight UTC, which is way less likely to come up in practice and is therefore officially used to mark the "unset" state. With protobuf, we apparently cannot rely on a similar trick.)
Using the new optional
keyword or oneof
creates a larger message size (again IIUC), so there might be an argument to not use it on all semantically optional fields, but only on fields where it is important to distinguish between "unset" and "set to default value". (Perhaps that's the case for all semantically optional fields?)
I don't think going back to proto2 is an acceptable option at this time.
Hey thanks for opening this to discuss. This is actually not an issue for Timestamp or Exemplars since they are of message type, and it's very easy to detect if the field was set on the wire.
This is outlined in the documentation you linked for proto3 field presence: https://github.com/protocolbuffers/protobuf/blob/v3.12.0/docs/field_presence.md#presence-in-proto3-apis.
Here's an example of this, you can see very clearly if those fields are set or not:
package main
import (
"fmt"
"github.com/golang/protobuf/proto"
"google.golang.org/protobuf/types/known/timestamppb"
)
func main() {
data, err := proto.Marshal(&CounterValue{})
if err != nil {
panic(err)
}
var msg CounterValue
if err := proto.Unmarshal(data, &msg); err != nil {
panic(err)
}
fmt.Printf("msg: %v\n", msg.String())
fmt.Printf("value.Created: %v\n", msg.Created)
fmt.Printf("value.Exemplar: %v\n", msg.Exemplar)
data, err = proto.Marshal(&CounterValue{
Created: ×tamppb.Timestamp{Seconds: 123},
})
if err != nil {
panic(err)
}
if err := proto.Unmarshal(data, &msg); err != nil {
panic(err)
}
fmt.Printf("msg: %v\n", msg.String())
fmt.Printf("value.Created: %v\n", msg.Created)
fmt.Printf("value.Exemplar: %v\n", msg.Exemplar)
data, err = proto.Marshal(&CounterValue{
Created: ×tamppb.Timestamp{Seconds: 123},
Exemplar: &Exemplar{Value: 42},
})
if err != nil {
panic(err)
}
if err := proto.Unmarshal(data, &msg); err != nil {
panic(err)
}
fmt.Printf("msg: %v\n", msg.String())
fmt.Printf("value.Created: %v\n", msg.Created)
fmt.Printf("value.Exemplar: %v\n", msg.Exemplar)
}
Output:
$ go run bin/proto/*.go
msg:
value.Created: <nil>
value.Exemplar: <nil>
msg: created:{seconds:123}
value.Created: seconds:123
value.Exemplar: <nil>
msg: created:{seconds:123} exemplar:{value:42}
value.Created: seconds:123
value.Exemplar: value:42
Good to learn about the singular message case. That will indeed defuse many cases. I guess we still need to go through all singular numeric, enum, and string fields to see if there is any issue with not-set vs. default.
I assume that we never have the necessity to distinguish between a not-set repeated field and a repeated field with zero repetitions (but if anyone has contrary evidence, please speak up).
We don't have maps or bytes in the current proto file.