capnproto / go-capnp

Cap'n Proto library and code generator for Go

Home Page:https://capnproto.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Generator can create duplicate RegisterSchema functions

Hoops opened this issue Β· comments

After the change for #479, if you have multiple capnp files in a package, e.g.

$ cat book.capnp
using Go = import "/go.capnp";
@0x85d3acc39d94e0f8;
$Go.package("books");
$Go.import("foo/books");

struct Book {
    title @0 :Text;
    # Title of the book.

    pageCount @1 :Int32;
    # Number of pages in the book.
}

and

$ cat book2.capnp
using Go = import "/go.capnp";
@0x86115605f6b5e399;
$Go.package("books");
$Go.import("foo/books");

struct Book2 {
    title @0 :Text;
    # Title of the book.

    pageCount @1 :Int32;
    # Number of pages in the book.
}

then

$ capnp compile -I$HOME/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.24/std -ogo:. *.capnp 
$ go mod init books; go mod tidy
go: creating new go.mod: module books
go: to add module requirements and sums:
	go mod tidy
go: finding module for package capnproto.org/go/capnp/v3/schemas
go: finding module for package capnproto.org/go/capnp/v3/encoding/text
go: finding module for package capnproto.org/go/capnp/v3
go: found capnproto.org/go/capnp/v3 in capnproto.org/go/capnp/v3 v3.0.0-alpha.25
go: found capnproto.org/go/capnp/v3/encoding/text in capnproto.org/go/capnp/v3 v3.0.0-alpha.25
go: found capnproto.org/go/capnp/v3/schemas in capnproto.org/go/capnp/v3 v3.0.0-alpha.25
$ go build ./...
# books
./book2.capnp.go:113:6: RegisterSchema redeclared in this block
	./book.capnp.go:113:6: other declaration of RegisterSchema

There will be duplicate functions in the same package. Maybe the register function needs to be generated in a separate file per-package with one RegisterSchema for all?

Thanks for the report. This one's on me -- I never put more than one schema in a package so I always forget this use case. We should add some tests that do this. The solution you propose sounds reasonable.

commented

Is there a recommended workaround for the interim? Downgrade to alpha.24?

@hspaay I think the workaround would be to merge your files into one. Note that you can still import types from separate packages; I do this quite a bit, and it has prevented me from ever requiring mutli-file packages.

commented

@lthibault thanks for the very quick reply :)

merge your files into one

Not sure if I follow. I've got 13 capnp files, one per micro-service and they go into the 'api' package. Are you recommending to put all of these into a single file? I love how compact capnp files are but putting all in a single file might be a bit too much.

commented

image

I tend to put each .capnp file in its own package (which is basically why this hasn't come up); I agree dumping everything into the same file could get unwieldly, and furthermore, you'd have to make the interface IDs explicit to avoid changing them, since those are a function of the parent namespace's ID (the root of which is the file ID) and the name of the type/interface.

See e.g. https://github.com/zenhack/go.sandstorm/tree/master/capnp

commented

@zenhack thanks for that.
The primary reason I like the single folder is that to use an api it is a simple import "hubapi".
But yeah, I understand the problem now and possible workarounds. Maybe I'll come around and switch to use the package folders. For a large project that makes a lot of sense.

For now I think I'll wait a bit using alpha.24 and hope for a fix. Its all still alpha (including my stuff) so no rush.

Again, I much appreciate the awesome support you guys are giving.

@hspaay I use the same approach as @zenhack. Correspondingly, here is my typical application structure. The cluster.capnp package/file imports from all the others (the Host interface is basically the top-level capability in my system).

Hope this helps.

Again, I much appreciate the awesome support you guys are giving.

It's truly our pleasure! πŸ˜ƒ

Just chiming in to say that this is affecting us at Cloudflare too. We're pinning to alpha.24 in the meantime.

@alexforster Any chance you might be able to submit a PR? @zenhack recently passed away and we're spread very thin right now. Realistically, I don't think I'll be getting to this one for a while.

If you are able to contribute, I am more than happy to lend my support.

P.S.: thank you for your feedback, btw!

@lthibault wrt "zenhack recently passed away".
OMG! That is terrible news. My condolences to those who knew him. He was such a friendly and helpful person. 😒

@hspaay OMG, I'm so sorry to drop the news on you so suddenly. For some reason I thought you already knew! Yes, Ian was truly an exceptional guy, and it was all very sudden. Go-capnp will be fine, but Ian's presence is truly missed :(

Well I've been away for a bit so my own fault for not keeping up to date.
Sigh I'm going to have a stiff drink later and a good toast to Ian.
Hope you are all okay. He will indeed be truly missed.
All the best Ian, and thank you so much for the help. You won't be forgotten.

Sigh I'm going to have a stiff drink later and a good toast to Ian.

You'll be in good company. I've poured a few out for Ian.
All good over here, and wishing you the same.

I'm very sorry to hear that. My condolences. I'll put some thought into this...

Looking back at #479, it seems the broader goal of the change was to facilitate dead-code elimination (edit: and to get rid of magic static globals), so going back to the init() style registration isn't an option. There needs to continue to be an explicit "register schema" action for users to take.

With that in mind, I think there are really only three choices:

Option 1: automatically disambiguate the RegisterSchema functions

Instead of generating one RegisterSchema function for each output file like this:

func RegisterSchema(reg *schemas.Registry) {
	reg.Register(&schemas.Schema{
		String: schema_bdf87d7bb8304e81,
		Nodes: []uint64{
			0xac7096ff8cfc9dce,
			0xb9c6f99ebf805f2c,
			0xf264a779fef191ce,
		},
		Compressed: true,
	})
}

Generate a Register{Type}Schema for each type in the file, like this:

func RegisterNamespaceSchema(reg *schemas.Registry) {
	reg.Register(&schemas.Schema{
		String: schema_bdf87d7bb8304e81,
		Nodes: []uint64{0xac7096ff8cfc9dce},
		Compressed: true,
	})
}

func RegisterNameSchema(reg *schemas.Registry) {
	reg.Register(&schemas.Schema{
		String: schema_bdf87d7bb8304e81,
		Nodes: []uint64{0xb9c6f99ebf805f2c},
		Compressed: true,
	})
}

func RegisterAllowCancellationSchema(reg *schemas.Registry) {
	reg.Register(&schemas.Schema{
		String: schema_bdf87d7bb8304e81,
		Nodes: []uint64{0xf264a779fef191ce},
		Compressed: true,
	})
}

Upsides:

  • Makes this name collision problem much less likely

Downsides:

  • Breaks the API
  • Name collision still possible
  • Pushes a lot of mandatory boilerplate onto users
  • Easy to forget to register transitive dependencies (think: struct Visitor depends on enum CountryCode) – though I'm not sure if that would actually matter?

Option 2: allow users to manually disambiguate the RegisterSchema functions

We could add an optional file annotation similar to the existing $Go.package(...) and $Go.import(...) annotations that lets the user influence the name of the RegisterSchema function. For example...

$Go.package("mypkg");
$Go.import("mypkg");
$Go.schema("Visitor");

This would result in generating a function like...

func RegisterVisitorSchema(reg *schemas.Registry) {
    // ...
}

If this file annotation isn't specified, then the default name RegisterSchema would continue to be generated.

Upsides:

  • No API breakage unless the user explicitly opts in to it
  • Punts the problem of name collision to the user (and allows them to fix it)
  • Likely simple to implement

Downsides:

  • Less mandatory boilerplate than option 1, but still more boilerplate than users face today

Option 3: make a "global" RegisterSchema function

Generate one "global" RegisterSchema function that registers all schemas within a particular package.

Upsides:

  • Doesn't hurt dead-code elimination
  • No API breakage

Downsides:

  • Right now, the generator writes one output file for each input file. We'd either have to pick a random input file to put this global RegisterSchema function in, or we'd have to generate an extra file that just contains the global RegisterSchema function and its associated bits. Is that "bad"? What would we call that file?

I honestly think I like option 2 the most, and I also think it's the only one I'd have a realistic chance at having time to implement.

@alexforster Thank you for these detailed thoughts; they're super helpful when diving back into this issue. πŸ˜ƒ

I agree Option 2 sounds like the better one. I'd like to get @davidhubbard's thoughts on this as well, since he has recently worked with the code-generator on a loosely-related issue, and will have the freshest recollection of how everything works! David, what do you think?

BTW, Alex, feel free to join us in the matrix channel. Most of the dev work goes on in there, and we'd love to include you in the discussion.

Thinking out loud, #479 took a step toward schema "belonging" to the package. This issue then asks the question, how do capnp files map to the Go concept of packages. I'd like to think any Golang-isms aren't implicit constraints on how capnp works.

That's why my first preference is Option 3 or, in my own words, the go code-gen is tasked with making the right decisions for Golang-isms.

Option 2 does have an appeal but wouldn't users then have to deal with the problem, instead of avoiding it automatically by keeping track of the presence of a RegisterSchema() block?

I just sent #544 with Option 3 to noodle around with. @alexforster can you please check if the RegisterSchema() block works correctly with this change?

Nice! From an ergonomics perspective, option 3 is the clear winner. I'll run this over our schemas tomorrow and see how it looks.

I added a command line flag capnpc-go --forceschemasalways but if it's invoked from capnpc per the usual, that flag isn't available. If #544 causes your build to break badly, at least there's a quick patch you can apply while we investigate.

There may be a fairly remote problem if capnp is invoked for a single file at a time. It does not consider any capnp source files that could somehow show up later in the same package.

I think a more complete solution might involve searching the output dir for any existing .go file and scanning it for any package declaration.

A cleaner (but also probably more controversial) solution could be to just generate a single output file per Go module. I think mapping input files to output files is kind of an abstraction leak. If the output conceptually is "multiple Go modules" then trying to maintain the input file structure doesn't make sense.

We're moving toward invalidating the cache of generated go code and naming the output files, that's encouraging if hard.

I'd like to give this some careful thought. Hopefully an immediate improvement will buy some time to land on a satisfactory solution.

@alexforster I just pushed a fix and hopefully it hasn't caused confusion -

- flag.BoolVar(&opts.forceSchemasAlways, "forceschemasalways", true, 
+ flag.BoolVar(&opts.forceSchemasAlways, "forceschemasalways", false, 

And I verified it with: capnpc -o go persistent-simple.capnp persistent-samepkg.capnp; grep RegisterSchemas *.go

So, it did only generate a single RegisterSchema method, but afaict that method only registers the schemas in the particular file that it chose, not all of the schemas in the module... maybe?

Here's what I see:

func RegisterSchema(reg *schemas.Registry) {
	reg.Register(&schemas.Schema{
		String: schema_b1185f68a81b6c14,
		Nodes: []uint64{
			0x919a7cb1581f723b,
			0x9cddd7d7699e2c8c,
			0xa7bd6b63e6bab3a5,
			0xbda68e8c88b1f5da,
			0xd7dd17c51626b5d5,
			0xf4e61deb1bb7531b,
		},
		Compressed: true,
	})
}

It only has six "nodes" (whatever those are), whereas before each individual RegsiterSchema would register nodes such that the total number was probably in the hundreds.

I don't actually know what these nodes are, so maybe this is fine.

@alexforster you are correct. Let me rework the pull request to include the schemas of all the files in the module.

@lthibault can you please re-review, since this is a rewrite of #544 ?

@alexforster this collects all the schemas first before writing any files. Would you be able to recheck this against your workflow?

Sure, I'll run it tomorrow. I really appreciate your work on this!

I created #545

There are missing nodes in RegisterSchemas. For instance, https://github.com/capnproto/go-capnp/blob/main/flowcontrol/internal/test-tool/writer.capnp.go is generated with only the top-level ID in the list of Nodes:

$ cd capnpc-go
capnpc-go $ go build
capnpc-go $ (  cd ../flowcontrol/internal/test-tool && capnp compile --no-standard-import -I../../../std -o- writer.capnp ) | capnpc-go
capnpc-go $ mv writer.capnp.go ../flowcontrol/internal/test-tool/writer.capnp.go
capnpc-go $ git diff ../flowcontrol/internal/test-tool/writer.capnp.go
diff --git a/flowcontrol/internal/test-tool/writer.capnp.go b/flowcontrol/internal/test-tool/writer.capnp.go
index 3f5d8e1..2089ab9 100644
--- a/flowcontrol/internal/test-tool/writer.capnp.go
+++ b/flowcontrol/internal/test-tool/writer.capnp.go
@@ -314,28 +314,20 @@ func (f Writer_write_Results_Future) Struct() (Writer_write_Results, error) {
        return Writer_write_Results(p.Struct()), err
 }
 
-const schema_aca73f831c7ebfdd = "x\xda\x12\xa8u`1\xe4\xcdgb`\x0a\x94ae" +
-       "\xfb_~\xe4\xb1K\xfc\xd9\x1d\x0d\x0c\x82\"\x8c\x0c\x0c" +
-       "\xac\x8c\xec\x0c\x0c\xc6\xb2\x8c\\\x8c\x0c\x8c\xc2\xaa\x8c\xf6" +
-       "\x0c\x8c\xff\x7f<WI\xe8\xb9gy\x13\xa2\x80\x05$" +
-       "\xef\xca(\xc4\xc8\xc0\xf2?[\xb8\x7f\xf9\x96\x08\xbd\x1f" +
-       "\x0c\x82\xbc\xcc\xff\xef\xee\xaf\x93i\xb6_\xbe\x86\x81\x81" +
-       "QX\x97q\x91\xb0)\xc8$aCFw\xe1HF" +
-       "v\x06\x9d\xff\xe5E\x99%\xa9Ez\xc9\xcc\x89\x05y" +
-       "\x05V\xe1\x10\x1eXP% \xb1(1\xb7\x98\x81!" +
-       "\x90\x85\x99\x85\x81\x81\x85\x91\x81A\x90W\x8b\x81!\x90" +
-       "\x83\x991P\x84\x89\x91?%\xb1$\x91\x91\x97\x81\x89" +
-       "\x91\x97\x81\x11\x9f9A\xa9\xc5\xa59%\x8c\xc5p5" +
-       "\x8c05\xec%\xa9E\x01\x8c\x8c\x81,\xcc\xac\x0c\x0c" +
-       "p/3\xc2\xbc&(h\xc4\xc0$\xc8\xca.\x0f\xd6" +
-       "\xe8\xc0\x18\xc0\xc8\x08\x08\x00\x00\xff\xff\x1d\x88R\x0a"
+const schema_aca73f831c7ebfdd = "x\xda\x12Hv`1\xe4\xcdgb`\x0a\x94ae" +
+       "\xfb\x9f-\xdc\xbf|K\x84\xde\x0f\x06A^\xe6\xffw" +
+       "\xf7\xd7\xc94\xdb/_\xc3\xc0\xc0(,\xcb\xb8HX" +
+       "\x95\x91\x9d\x81AX\x91\xd1]\xd8\x93\x91\x9d\xc1\xe9\x7f" +
+       "yQfIj\x91^2cbA^\x81UxQ" +
+       "&{IjQ\x00#c \x0b3+\x03\xc3\xff\xf2" +
+       "#\x8f]\xe2\xcf\xeeh`\xfc\xf1\\%\xa1\xe7\x9e\xe5" +
+       "MAA#\x06&AVvy\xb0F\x07\xc6\x00F" +
+       "F@\x00\x00\x00\xff\xff\x1b\xae%\xa6"
 
 func RegisterSchema(reg *schemas.Registry) {
        reg.Register(&schemas.Schema{
                String: schema_aca73f831c7ebfdd,
                Nodes: []uint64{
-                       0x80b8cd5f44e3c477,
-                       0xd939de8c6024e7f8,
                        0xf82e58b4a78f136b,
                },
                Compressed: true,

Sorry for falling off, lots of crises at work. Let me know if I can help now that things are calming down for me

@alexforster no worries! Want to try #545 on your workflow?

@lthibault I found and fixed a bug with this just now, hence the reason I reopened this issue.

Thanks for working on this!

Another Cloudflarian here πŸ‘‹

Unfortunately I don't know the fix in place covers all cases. Similarly to @alexforster we're looking to pull multiple capnp into a single module. However, these files are sourced from different locations, so when doing the generation we make heavy use of --src-prefix to keep things relative.
Calls look like

capnp compile --no-standard-import -I capnp-vendor --src-prefix capnp-vendor/mainmodule/layerA -ogo:capnp-vendor/generated/mainmodule ...
capnp compile --no-standard-import -I capnp-vendor --src-prefix capnp-vendor/mainmodule/layerB -ogo:capnp-vendor/generated/mainmodule ...
...

AFAICT there's not a way to combine these into a single generation command, which is what the current fix requires.

Looking at other projects to get a sense of patterns: prometheus metric registration stands out as something in a similar vein: we have a bunch of Collectors we need to iterate over and make available to a backend. Perhaps there's an option where we wrap a node's schema in a struct that implements a simple

type SchemaRegisterer interface {
	Register(reg *schemas.Registry)
}

And then look at option 3 as proposed in #490 (comment).

Of course, if there's a way to make it work with the current solution that's be amazing!

#490 (comment) - This comment makes sense.

I'm hesitating (more like -- stalling) to have the go-capnp codegen intelligently scan what has happened before it was invoked. Especially with the files being in different locations. Golang isn't really patterned that way: files in different locations all getting compiled into a single package is definitely not a golang-ism...

Again, this is not me saying No, Never. But basically, Not Right Now...?

But basically, Not Right Now...?

@davidhubbard, If @ndisidore were able to contribute a patch, would we be able to merge it in? Or are we blocked by the codegen work you mentioned?

@ndisidore Are you at all interested in contributing a patch? We could use the help. πŸ™‚

Agree, high quality patches welcome.

In terms of "Not Right Now," go-capnp can't be everything to everybody, and if it were me (it's not, see below) I'd really try to make the gathering of the files in different directories a build step separate from invoking the go-capnp codegen.

Thus "Not Right Now" to me is my way of trying to keep all of us pushing on the work that's broadly needed for go-capnp to keep growing. I don't mean to make it sound like I'd block a motivated contributor from adding and maintaining a feature that's super meaningful to them.