Make approach to defining enums consistent across the project

Question

Make approach to defining enums consistent across the project

calebbrown opened this issue a year ago · comments

#654 introduces the enum type Ecosystem implemented differently to other enums (e.g. analysis.Mode and staticanalysis.Task and analysisrun.DynamicPhase)

We should develop a consistent approach to how we define enums:

Some areas that should be covered:

should the base type be string vs numeric
how do we handle lists of enums
how do enums interact with flags
how do enums handle serialization (JSON)

Both:

require String()

Numeric based enums:

can use iota
can consume as little as 8bits (byte)
very fast to compare equality
encoding.TextMarshaler and encoding.TextUnmarshaler needed to support flag.TextVar and json serialization
requires strings to be specified for marshaling and unmarshaling (although an array of strings may make this easier)

String based enums:

consume 128 bits per enum
simpler String() function and serialization
equality is more costly (at minimum two comparisons, maximum the size of the shortest string)

Caleb Brown · Answer 1 · Fri Feb 24 2023 10:13:57 GMT+0800 (China Standard Time)

Some research:

https://www.practical-go-lessons.com/chap-27-enum-iota-and-bitmask - proposes int + iota and implementing fmt.Stringer() and serialization interfaces.
https://threedots.tech/post/safer-enums-in-go/ - documents each style, suggests strings are better for URL slugs
Go stdlib:
- https://github.com/golang/go/blob/master/src/time/time.go#L302 - uses int, and a []string to associate constant with string
- https://github.com/golang/go/blob/master/src/net/interface.go#L39 - uses uint as bitmask, and a []string to associate constant with string
GitHub:
- https://cs.github.com/TencentBlueKing/bk-cmdb/blob/53fa5996b3ddff2ec8e9dba039452ce46ce26dca/src/common/blog/glog/glog.go - uses int32 and []string to associate constant with string, implements flag.Value (origin: https://code.google.com/p/google-glog)
- https://github.com/uber-go/zap/blob/master/zapcore/level.go - uses int8 and switches to implement various interfaces.
- https://github.com/zhlhahaha/gvisor/blob/master/runsc/config/config.go - uses int and switches to implement flag.Value
- https://github.com/dt/cockroach/blob/9fd4ce118639ff09d11fffd85fbad6d42fa727c4/pkg/cli/flags_util.go#L156 - dumpMode uses int, and switches to implement flag.Value

As far as flag.TextVar usage go, the usage is currently super low since the API was only recently introduced.

The standard approaches I've seen for enums is either:

type Foo int
const (
  A Foo = iota
  B
  C
)
func (f Foo) String() string {
  switch f {
  case A:
    return "A"
  case B:
    return "B"
  case C:
    return "C"
  default:
    return "<unknown>"
  }
}

or

type Foo int
const (
  A Foo = iota
  B
  C
)
fooStrings = []string{"A","B","C"}  // this can also be map[Foo]string
func (f Foo) String() string {
  // lookup string in fooStrings and return it
}

Caleb Brown · Answer 2 · Fri Feb 24 2023 10:20:41 GMT+0800 (China Standard Time)

My proposal:

use numeric based enum types (e.g. int, int8) with iota
- rationale: this approach is almost universally used
use []string or map[x]string to associate const to string, and for parsing
- rationale: while switches are more common, this approaches reduces string duplication,
  and supports producing help text for flags
implement encoding.TextMarshaler and encoding.TextUnmarshaler rather than flag.Value and JSON marshalers
- rationale: encoding.TextMarshaler and encoding.TextUnmarshaler are both usable with the new flag.TextVar and in-place of JSON marshalers, requiring only 2 methods instead of 4.

Max Fisher · Answer 3 · Fri Feb 24 2023 12:04:51 GMT+0800 (China Standard Time)

We should definitely have a consistent set of functions to work with types used as enums.

Perhaps the decision of type could be decided case-by-case? e.g. in some cases having a string name defined as the enum could be more meaningful, as described in the reference above. But happy to restrict ourselves to int if that's what we want.

For numeric based enums, I think the idea of using a slice or map to define the string values for the enum in a single place is a very good one.

Building on Caleb's proposal, how about the following five functions/methods for type Foo which is used like an enum:

String() string implementing fmt.Stringer
FooFromString(name string) (Foo, error) implemented like this (from the same reference linked above), perhaps with the error replaced by a boolean true/false like in task.go
MarshalText()(text []byte, err error) implementing encoding.TextMarshaler for serialisation and flag usages (calls String())
UnmarshalText(text []byte) error implementing encoding.TextUnmarshaler for deserialisation and flag interfaces (calls FooFromString())
FooValues() []Foo which returns a slice containing all valid enum values.

Methods 1 and 2 are 'nicer' to use in internal code with strings, while 3 and 4 are implementing external interfaces which are designed to be very general and so are a little bit less readable.

Caleb Brown · Answer 4 · Fri Feb 24 2023 12:21:50 GMT+0800 (China Standard Time)

I agree with everything - except for naming of (2) FooFromString.

Usually methods for returning a type from a string are named Parse (i.e when the package is named foo) or ParseFoo (e.g. https://pkg.go.dev/net/url#Parse, https://pkg.go.dev/time#Parse, https://pkg.go.dev/net/mail#ParseAddress, and https://pkg.go.dev/strconv).

Oh, and for FooValues() I'd accept Values() []Foo as well, if the package is named foo (so foo.Values() instead of foo.FooValues()).

Caleb Brown · Answer 5 · Fri Feb 24 2023 12:23:41 GMT+0800 (China Standard Time)

I'd also add that implementing ParseFoo(), encoding.TextMarshaler and encoding.TextUnmarshaler could be optional.

But if an enum is used in a flag it must implement all three.

Max Fisher · Answer 6 · Fri Feb 24 2023 12:33:28 GMT+0800 (China Standard Time)

Sure, naming the 2nd method ParseFoo() or foo.Parse() depending on the containing package name and/or size sounds good to me.

Should we consistently define a none/empty value for each enum?