OpenObservability / OpenMetrics

Evolving the Prometheus exposition format into a standard.

Home Page:https://openmetrics.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

proto: Problem differentiating not-set value from default value

mxinden opened this issue · comments

Problem

The OpenMetrics Protobuf specification is using the proto3 syntax.

With the proto3 syntax all fields are optional by default. This is not to be confused with null-able. While one can choose not to specify a field it is then represented by its default value. Someone receiving a proto3 encoded payload has no way to tell the difference between a field explicitly set to its default value or not set at all.

The most prominent case where this can be problematic is for boolean fields where both the default value and the not-set value is false. In the context of OpenMetrics, as far as I can tell, one problematic example would be the examplar field in the CounterValue message. A use can not tell the difference between a not-set examplar and an examplar set to 0.

// Value for COUNTER MetricPoint.
message CounterValue {
// Required.
oneof total {
double double_value = 1;
uint64 int_value = 2;
}
// The time values began being collected for this counter.
// Optional.
google.protobuf.Timestamp created = 3;
// Optional.
Exemplar exemplar = 4;
}

Solution

To solve the above I see 3 solutions:

  1. The optional keyword, which would allow differentiating between a default value and a not-set value, was initially removed with the proto3 syntax, but has been revived with v3.12.0. In case requiring protoc v3.12.0 is fine, one could just make use of the revived optional keyword.
  2. The optional keyword is just syntactic sugar for oneof. Thus, in case requiring protoc v3.12.0 is not an option, one can use oneof directly. See this stackoverflow answer.
  3. Use proto2 instead of proto3 (see original Prometheus proto definition).

Further reading


What do people think? Do you see alternative approaches?

Given that I don't have much expertise with Protobuf I might be missing something. If so, I would appreciate an answer pointing out my mistake.

Yeah, that's definitely a problem in general. For the exemplar, only the rare case of an exemplar with a value of 0 and without any labels and timestamp (or a default timestamp, see below) would be indistinguishable from a non-existing exemplar, but it's still a corner case that might strike eventually.

IIUC, the google.protobuf.Timestamp type has a default value of 1970-01-01, midnight UTC, which means that all the optional timestamps are indistinguishable from that timestamp, which is a perfectly relevant timestamp for practical purposes. (The Go Time type, for example, has a zero value 0001-01-01, midnight UTC, which is way less likely to come up in practice and is therefore officially used to mark the "unset" state. With protobuf, we apparently cannot rely on a similar trick.)

Using the new optional keyword or oneof creates a larger message size (again IIUC), so there might be an argument to not use it on all semantically optional fields, but only on fields where it is important to distinguish between "unset" and "set to default value". (Perhaps that's the case for all semantically optional fields?)

I don't think going back to proto2 is an acceptable option at this time.

Hey thanks for opening this to discuss. This is actually not an issue for Timestamp or Exemplars since they are of message type, and it's very easy to detect if the field was set on the wire.

This is outlined in the documentation you linked for proto3 field presence: https://github.com/protocolbuffers/protobuf/blob/v3.12.0/docs/field_presence.md#presence-in-proto3-apis.

Here's an example of this, you can see very clearly if those fields are set or not:

package main

import (
	"fmt"
	"github.com/golang/protobuf/proto"
	"google.golang.org/protobuf/types/known/timestamppb"
)

func main() {
	data, err := proto.Marshal(&CounterValue{})
	if err != nil {
		panic(err)
	}

	var msg CounterValue
	if err := proto.Unmarshal(data, &msg); err != nil {
		panic(err)
	}

	fmt.Printf("msg: %v\n", msg.String())
	fmt.Printf("value.Created: %v\n", msg.Created)
	fmt.Printf("value.Exemplar: %v\n", msg.Exemplar)

	data, err = proto.Marshal(&CounterValue{
		Created: &timestamppb.Timestamp{Seconds: 123},
	})
	if err != nil {
		panic(err)
	}

	if err := proto.Unmarshal(data, &msg); err != nil {
		panic(err)
	}

	fmt.Printf("msg: %v\n", msg.String())
	fmt.Printf("value.Created: %v\n", msg.Created)
	fmt.Printf("value.Exemplar: %v\n", msg.Exemplar)

	data, err = proto.Marshal(&CounterValue{
		Created:  &timestamppb.Timestamp{Seconds: 123},
		Exemplar: &Exemplar{Value: 42},
	})
	if err != nil {
		panic(err)
	}

	if err := proto.Unmarshal(data, &msg); err != nil {
		panic(err)
	}

	fmt.Printf("msg: %v\n", msg.String())
	fmt.Printf("value.Created: %v\n", msg.Created)
	fmt.Printf("value.Exemplar: %v\n", msg.Exemplar)
}

Output:

$ go run bin/proto/*.go
msg: 
value.Created: <nil>
value.Exemplar: <nil>
msg: created:{seconds:123}
value.Created: seconds:123
value.Exemplar: <nil>
msg: created:{seconds:123} exemplar:{value:42}
value.Created: seconds:123
value.Exemplar: value:42

Good to learn about the singular message case. That will indeed defuse many cases. I guess we still need to go through all singular numeric, enum, and string fields to see if there is any issue with not-set vs. default.

I assume that we never have the necessity to distinguish between a not-set repeated field and a repeated field with zero repetitions (but if anyone has contrary evidence, please speak up).

We don't have maps or bytes in the current proto file.