ossf / package-analysis

Open Source Package Analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Make approach to defining enums consistent across the project

calebbrown opened this issue · comments

#654 introduces the enum type Ecosystem implemented differently to other enums (e.g. analysis.Mode and staticanalysis.Task and analysisrun.DynamicPhase)

We should develop a consistent approach to how we define enums:

Some areas that should be covered:

  • should the base type be string vs numeric
  • how do we handle lists of enums
  • how do enums interact with flags
  • how do enums handle serialization (JSON)

Both:

  • require String()

Numeric based enums:

  • can use iota
  • can consume as little as 8bits (byte)
  • very fast to compare equality
  • encoding.TextMarshaler and encoding.TextUnmarshaler needed to support flag.TextVar and json serialization
  • requires strings to be specified for marshaling and unmarshaling (although an array of strings may make this easier)

String based enums:

  • consume 128 bits per enum
  • simpler String() function and serialization
  • equality is more costly (at minimum two comparisons, maximum the size of the shortest string)

Some research:

As far as flag.TextVar usage go, the usage is currently super low since the API was only recently introduced.

The standard approaches I've seen for enums is either:

type Foo int
const (
  A Foo = iota
  B
  C
)
func (f Foo) String() string {
  switch f {
  case A:
    return "A"
  case B:
    return "B"
  case C:
    return "C"
  default:
    return "<unknown>"
  }
}

or

type Foo int
const (
  A Foo = iota
  B
  C
)
fooStrings = []string{"A","B","C"}  // this can also be map[Foo]string
func (f Foo) String() string {
  // lookup string in fooStrings and return it
}

My proposal:

  • use numeric based enum types (e.g. int, int8) with iota
    • rationale: this approach is almost universally used
  • use []string or map[x]string to associate const to string, and for parsing
    • rationale: while switches are more common, this approaches reduces string duplication,
      and supports producing help text for flags
  • implement encoding.TextMarshaler and encoding.TextUnmarshaler rather than flag.Value and JSON marshalers
    • rationale: encoding.TextMarshaler and encoding.TextUnmarshaler are both usable with the new flag.TextVar and in-place of JSON marshalers, requiring only 2 methods instead of 4.

We should definitely have a consistent set of functions to work with types used as enums.

Perhaps the decision of type could be decided case-by-case? e.g. in some cases having a string name defined as the enum could be more meaningful, as described in the reference above. But happy to restrict ourselves to int if that's what we want.

For numeric based enums, I think the idea of using a slice or map to define the string values for the enum in a single place is a very good one.

Building on Caleb's proposal, how about the following five functions/methods for type Foo which is used like an enum:

  1. String() string implementing fmt.Stringer
  2. FooFromString(name string) (Foo, error) implemented like this (from the same reference linked above), perhaps with the error replaced by a boolean true/false like in task.go
  3. MarshalText()(text []byte, err error) implementing encoding.TextMarshaler for serialisation and flag usages (calls String())
  4. UnmarshalText(text []byte) error implementing encoding.TextUnmarshaler for deserialisation and flag interfaces (calls FooFromString())
  5. FooValues() []Foo which returns a slice containing all valid enum values.

Methods 1 and 2 are 'nicer' to use in internal code with strings, while 3 and 4 are implementing external interfaces which are designed to be very general and so are a little bit less readable.

I agree with everything - except for naming of (2) FooFromString.

Usually methods for returning a type from a string are named Parse (i.e when the package is named foo) or ParseFoo (e.g. https://pkg.go.dev/net/url#Parse, https://pkg.go.dev/time#Parse, https://pkg.go.dev/net/mail#ParseAddress, and https://pkg.go.dev/strconv).

Oh, and for FooValues() I'd accept Values() []Foo as well, if the package is named foo (so foo.Values() instead of foo.FooValues()).

I'd also add that implementing ParseFoo(), encoding.TextMarshaler and encoding.TextUnmarshaler could be optional.

But if an enum is used in a flag it must implement all three.

Sure, naming the 2nd method ParseFoo() or foo.Parse() depending on the containing package name and/or size sounds good to me.

Should we consistently define a none/empty value for each enum?