golang / protobuf

Go support for Google's protocol buffers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parsing protoc plugin option containing ','

Clement-Jean opened this issue · comments

Is your feature request related to a problem? Please describe.

Since compiler/protogen/protogen.go splits plugin options based on commas (see https://github.com/protocolbuffers/protobuf-go/blob/ec47fd138f9221b19a2afd6570b3c39ede9df3dc/compiler/protogen/protogen.go#L166), the following plugin option does not work as expected:

--my_opt=test="^\d{1,5}$"

It assigns ^\d{1 to test and the rest is considered as another option name. Obviously, we get the following error:

protoc-gen-check: no such flag -5}$

Describe the solution you'd like

As I was not sure about the process of submitting feature requests, I already worked on a bit of parsing code on Gerrit (see https://go-review.googlesource.com/c/protobuf/+/575155). However, it is low level parsing and so error prone. If there was a way to override option parsing, the user could provide their own parser and thus not being included in Protobuf compiler.

Not entirely sure how the API would look like, but we can discuss it. Here are few ideas:

  • If an override parsing method is not described, we default to the original parsing (split at ',' and use flags.Set).
  • Otherwise, we apply the overriding function
  • I think we could use a func(params string, separator string) ([]string, []error) type for the override. Or we could also have func(params string, separator string) ([]string) and an external way to report errors. This would let us pass strings.Split directly as default.

Describe alternatives you've considered

I considered a mini parser that differentiates between strings and the rest. The link to Gerrit has my actual implementation.

Additional context

I was mostly working with regexp. I wanted to have a customizable regexp as plugin options to generate code that will validate a phone number. I was using Twilio E.164 which looks like this ^\+[1-9]\d{1,14}$ and it was splitting at the comma.

@puellanivis @lfolger

Thanks for filing this feature request.

To summarize the different alternatives from my point of view:

  1. We build a parser that supports escaping of the ',' so that values (or even keys) can contain ','.
  2. We add an overwrite option for users to provide a custom parser. This could be as simple as func (string) []KeyValuePairs where KeyValuePairs is just a pair of strings.
  3. Users could encode their parameters with something that doesn't contain ',' (e.g. base64) before passing it to protoc and decode it when their callback is invoked.

Solution 1. requires us to define the exact grammer for escaping and add the implementation (which is 175 lines in the prototype change).

Solution 3. doesn't require any changes but is suboptimal to use and certainly works around a limitation in the protogen package).

Solution 2. requires a very small change to the protogen but adds another public API.

I would like to get @neild feedback on this as well.

I believe the behavior of separating options with a comma comes from protoc: If you pass --go_opt=a --go_opt=b, protoc passes a CodeGeneratorRequest.parameter of a,b. No quoting or escaping, it just joins all --*_opt params with a comma; see https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/compiler/command_line_interface.cc#L2393.

As such, I think the right thing to do is accept that commas are not valid in option values.

If you do need some different behavior, then you can always modify the CodeGeneratorRequest before passing it to protogen.Options.New. (This requires that you use New rather than Run, but Run isn't all that complicated.)

@neild if I understand correctly, what you are recommending is parsing the CodeGeneratorRequest.Parameters and quoting/escaping it. Basically, duplicating the protogen.Options.run logic in our own plugin. Am I getting this right?

My first recommendation is that you replace , with something else in your generator parameters.

Failing that, yes, I recommend that you parse the CodeGeneratorRequest.Parameters yourself.

(As a distant third, I recommend that you find the protoc maintainers and get them to add a repeated field to CodeGeneratorRequest containing the individual parameters.) (I am not actually recommending this.) (It would be nice, though.)

Closing this issue for now. Feel free to reopen if you have further input or questions.