neoeinstein / protoc-gen-prost

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Handle protobuf package name `_` in prost-crate

stefanvanburen opened this issue · comments

Hi! This is an admittedly contrived example but figured I'd report anyway.

// File: test.proto
syntax = "proto3";

package _;

message Test {}
# buf.gen.yaml
version: v1
plugins:
  - plugin: buf.build/community/neoeinstein-prost
    out: gen/src
  - plugin: buf.build/community/neoeinstein-prost-crate
    out: gen
    opt:
      - no_features

Run buf generate .. You'll end up with gen/mod.rs:

// @generated
// @@protoc_insertion_point(attribute:_)
pub mod  {
    include!("_.rs");
    // @@protoc_insertion_point(_)
}

The mod name is empty, which I think is invalid based on my read of the rust reference for modules. It looks like it's avoiding making the module name _, which is similarly invalid, based on the identifiers spec line:

with the additional constraint that a single underscore character is not an identifier.

Not really sure what to do other than to error with a _ package.

Actually, based on what Prost is doing with identifiers, maybe you could change the identifier to __ (double underscore), which would be a valid identifier, at least?: https://github.com/tokio-rs/prost/blob/e3deaa200b3a5500bf0403325d02716973b7296a/prost-build/src/ident.rs#L23. However, __ is a valid Protobuf package name too, so you'd potentially conflict with that package 😃.

Also:

// File: test1.proto
syntax = "proto3";

package _;

message Test {}
// File: test2.proto
syntax = "proto3";

message Test {}
# buf.gen.yaml
version: v1
plugins:
  - plugin: buf.build/community/neoeinstein-prost
    out: gen/src

Running buf generate . creates two files, named _ and _.rs, both containing:

// @generated
#[allow(clippy::derive_partial_eq_without_eq)]
#[derive(Clone, PartialEq, ::prost::Message)]
pub struct Test {
}
// @@protoc_insertion_point(module)

Again, not sure the answer, but it doesn't seem like generating a file named _ is desirable.

I'll start off by noting that _ is not a valid package name in the Protobuf spec either. The package line must follow the following grammar (which I've grabbed excerpts from):

package = "package" fullIdent ";"
fullIdent = ident { "." ident }
ident = letter { letter | decimalDigit | "_" }
letter = "A" ... "Z" | "a" ... "z"
decimalDigit = "0" ... "9"

As the grammar requires that an ident start with a letter, _ alone is not a valid Protobuf identifier. We don't do any special detection around that, but since it's not considered a valid ident, I don't think it's something that I'd take special care to handle.

Let me know if you have additional information that should cause me to reconsider, but that would also likely require coordination and changes to the underlying Prost library as well. The most likely change would be to more explicitly reject a _ package name rather than work around it.

Yeah, unfortunately I think the spec linked above is slightly off. In the Buf version of the spec, _ is included in the letter character class, meaning that package _; is (unfortunately) valid. And protoc seems to agree with that:

// test-underscore.proto
syntax = "proto3";

package _;

message Test {}
$ protoc --version
libprotoc 25.0

$ protoc --descriptor_set_out=underscore.binpb test-underscore.proto # passes

versus an invalid package -;:

// test-dash.proto
syntax = "proto3";

package -;

message Test {}
$ protoc --descriptor_set_out=dash.binpb test-dash.proto
test-dash.proto:3:9: Expected identifier.

I think the upstream spec should be fixed, heh. Anyway, happy to leave this closed; it's terribly niche.