golang / go

The Go programming language

Home Page:https://go.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

embed, cmd/go: add support for embedded files

rsc opened this issue · comments

In July, @bradfitz and I posted a draft design for embedded files. The doc links to a video, prototype code, and a Reddit discussion.

The feedback on that design has been overwhelmingly positive.

I propose to adopt the embedded files draft design for Go 1.16, with one addition, suggested in the discussion, to simplify the case of direct access to the bytes in a single embedded file.

As long as a file imports "embed" (import _ "embed" if necessary), it will be permitted to use //go:embed naming a single file (no glob patterns or directory matching allowed) to initialize a plain string or []byte variable:

//go:embed gopher.png
var gopherPNG []byte

The import is required to flag the file as containing //go:embed lines and needing processing. Goimports (and gopls etc) can be taught this rule and automatically add the import in any file with a //go:embed as needed.

The embedded files design depends on the file system interface draft design, which I've also proposed to adopt in #41190.

This issue is only about adopting the embedded files design, under the assumption that the file system interface design is also adopted. If this proposal is accepted before the file system interface design is, we'd simply wait for the file system interface design before starting to land changes.

Would it be an error to have a //go:embed directive without importing embed?

@rsc Maybe I have missed it in the draft, but I don't see the ability to embed a single file that you mention in your comment.
Also, would you be able to embed a single file as a const string as well?
Thanks for this great proposal.

@pierrec It's not in the draft doc (the "one addition" is the text in the comment above). Const strings can end up playing a role in deciding whether a program type checks, which would mean all type checkers would need to understand //go:embed'ed consts. In contrast, if we stick to vars, type checkers are none the wiser and can be left alone. Seems like we should probably stick to vars.

Is there a particular reason you wanted a const instead of a var? Using them should be about the same as far as efficiency. (References to const strings end up compiling to what amount to references to hidden vars anyway.)

Thanks for the explanation. I tend to embed static assets as const strings at the moment this is why I asked. I am fine with vars too!

Interesting, so I could do something like:

//go:embed version.txt
var Version string

And potentially even have a //go:generate comment to generate version.txt. That would cut out a large use case for makefiles/ldflags.

Is it an error if the file isn't found? If so, where technically is the error considered to occur? Link time?

Can we make sure that go:embed runs after go:generate so we can do things like easily generate versions, etc?

Can we make sure that go:embed runs after go:generate so we can do things like easily generate versions, etc?

From my understanding,go:generate will occur with go generate while go:embed would happen with go build.

@carlmjohnson Yes, it is always an error to say //go:embed foo where foo does not exist.
The error happens when compiling the source file containing that line.
(If you were to remove foo after compiling that source file, you still wouldn't get to a link step - the go command would notice that the package needs rebuilding because foo was deleted.)

commented

I think that this proposal is not complete without saying something about ETag.
https://old.reddit.com/r/golang/comments/hv96ny/qa_goembed_draft_design/fzi0pok/

@tv42, yes, we will make ETag work. I'm not sure what the shape of that is but we will.
(Also affirmed on #35950 (comment).)

commented

TwoThree things I've noticed from working with mjibson/esc:

  • As go:embed doesn't need to generate go files for embedding as read-only filesystem, it would take away the pain of changing timestamps on the go:generateed files that defied git porcelain tests on CI- very nice
  • One thing I didn't find in the proposal but would need is the ability to live-reload the embedded files during development cycles. Using mjibson/esc I can currently do that instructing it to use the local filesystem (though it wouldn't pick up new files) and change the behaviour using build tags. I'm wondering what that could fit into the proposal?
  • Update Another thing I remember is that esc required to be able to transparently strip (parts of) the base path in order to e.g. export an assets folder as web root.

Afterthought: I guess the second point could be remedied in conjunction with the io/fs proposal where I would either use the embedded or the live filesystem for inclusion? Implement the path stripping as io/fs middleware?

@andig You can already strip prefixes when serving a filesystem over HTTP. I agree that the live-reloading can be done by a third party library wrapping an io/fs.

commented

One more thing: if I understand correctly, embed will consider files locally to the package and forbids ... My current design has /assets and /server/ where the latter contains the server‘s code and today hosts the generated files. With this proposal the embed would need to move to the root folder as assets wouldn‘t be accessible from server. This imposes different accessibility constraints from normal imports. I was wondering if this is necessary for security reasons or if module-local embeds should be generally allowed.

One more thing: if I understand correctly, embed will consider files locally to the package and forbids ... My current design has /assets and /server/ where the latter contains the server‘s code and today hosts the generated files. With this proposal the embed would need to move to the root folder as assets wouldn‘t be accessible from server. This imposes different accessibility constraints from normal imports. I was wondering if this is necessary for security reasons or if module-local embeds should be generally allowed.

You can create a emed.go file in your assets directory and make the assets available as it's own package to the rest of your program.

Another explicit goal is to avoid a language change. To us, embedding static assets seems like a tooling issue, not a language issue.

Agreed. In my opinion, adding syntactical sugar in the language to support this tooling change is a language change. I'm sure this is obvious to others, but this is effectively comment-as-code.

I strongly feel that magic/sugar detracts from the simplicity and readability of the language; it is very easy to miss a magical comment that embeds a file. While a response to this could easily be "okay, then don't use it", this change means that a reviewer still has to be vigilant for others using this feature and has to remember that comments around variable declarations can break builds or fail at compile-time.

I believe this is going to add confusion, detract from language usability, and will result in opaque, large binaries without clear benefit (regarding the lattermost concern, this will even lead to an anti-pattern of re-building binaries due to plain file changes). If go mod allowed for a --withNonGoCodeAssets, I believe this would solve the needs of most developers that don't want to write more complex build pipelines (I assume end-user distribution is a smaller subset of the problem for users).

@tristanfisher, I understand your point about language vs tooling change. It's certainly near the line. The reason that I consider it more a tooling change is that the language spec is unaffected - whether a program is valid does not change, the type checking process does not change. All that changes is the initial value of that variable following the comment. In this way it is a bit like the linker's -X flag, which can set the initial value of a top-level var of type string. It's fine for us to disagree; I just wanted to make my definition clear and explain the distinction I'm making.

As for bloat, I guess we'll have to see, but I don't anticipate programs getting much larger than they already are. People already run tools that turn arbitrary files into Go code, check them into their repos, and make the compiler build them. The design removes some overhead from this process but does not enable anything new. Maybe people will abuse it now that it's easier to do, but on balance I don't expect that to be much of a problem. (And if some dependency embeds something so big that it bloats your binaries, you can always choose not to use that dependency.)

As for rebuilds due to plain file changes, the only files that can trigger rebuilds are the ones in your own top-level module, since dependencies are immutable. If you found rebuilds happening more often than you'd like, the only explanation is (1) you are embedding files and (2) you are modifying those files. You would be in complete control of doing something about either cause. (It would be another thing entirely if a dependency's choice of what to use was somehow forcing extra rebuilds or other expense on you. But that's not the case here.)

@rsc I agree that it's okay for us to disagree and I appreciate your response. My feeling is that if it's included by default in the standard tooling and comments can lead to an implicit initialization of a variable, then it's a language change. Outside of that debate, I guess my icky feeling is around more directives as "magic" comments that need to be memorized by (human) code readers. This could be taken to the absurd conclusion of adding new features via block comments that get handled at build time.

That said, if this gets added to the ecosystem, I will be thankful that importing embed is required -- that's better than nothing as a "hey, head's up" when auditing code. I think go mod allowing for non .go would solve a majority of use cases (I imagine most people are going to glob files for webservers) and would also live entirely in tooling.

I think your point regarding the linker is a good one. It also helps explain my feelings on this: if the very end user (e.g. not someone that simply imports a package) is making the decision, there's no way to be surprised by blobs of non-code coming along for the ride. My concerns are born out of reviewing/pairing-on others' work and "tech-leady" responsibilities, which is why I felt the need to respond.

I think "we'll have to see" sums it up well (I'm more cynical about bloat/misuse).

commented

I will read through the draft design tonight, so far it looks good from the TinyGo perspective.

I just wanted to clarify one thing:

On the other hand, projects like TinyGo and U-root target systems with more RAM than disk or flash. For those projects, compressing assets and using incremental decompression at runtime could provide significant savings.

I don't know about U-root, but for TinyGo the main targets are microcontrollers which normally have far more flash than RAM (usually a factor of 8 or 16). A quick look at the draft design seems to suggest the idea is to keep the files in read-only memory, which would work fine for these targets: the embedded files can be read directly from flash. It would most likely not be desirable for TinyGo targets to decompress files at runtime.

commented

The io/fs proposal on which this depends looks to be blocked on Readdir/FileInfo problems, under discussion in #41188 and previously #40352.

I've drafted an API to replace them in #41188 (comment)

commented

@andig

One thing I didn't find in the proposal but would need is the ability to live-reload the embedded files during development cycles.

embed.Files implements fs.FS, hence all you need to do is to use dev vs !dev build tag to switch a variable between embed.Files and the real FS.

commented

I filed #41265. It offers a new ReadDir() API for io/fs.

I have similar concerns as @tristanfisher . Go has been using magic comments as compiler directives for a long time (since the beginning?), but they are meant for corner cases and it is rare for them to appear in code. Given the popularity of embedding static content in Go binaries, //go:embed is likely to be more common. Maybe it's time to consider a different syntax for compiler directives?

Just a reminder that changing the Go syntax has a very high cost. Pretty much every Go tool out there would need to be updated and/or fixed to support the new syntax, for example.

I don't consider them magic comments. Lines starting with //go: are directives and could be defined as such in the spec. There's not a lot of semantic difference between //go:embed, @embed, [[embed]] or any other number of syntax variations, except the //go: prefix is already treated as non-code by Go tools. (my editor highlights those lines differently for example)

@mvdan If this proposal happens, Go syntax has changed. It's just changed in a way that doesn't break existing tooling. Maybe that seems pedantic.

@iand I'm not fussy about the specific syntax for compiler directives. I just think that it needs to be formalized at some point and the rules specified.

I think this proposal is a good idea. It solves a common problem. My concern is that the cost of adopting it should be made a little more explicit.

@jonbodner I share your concerns about magic comments. But to some extent the rules are specified by #37974.

@networkimprov, this is not the io/fs proposal. Please stop commenting about ReadDir here.

@jonbodner

I'm not fussy about the specific syntax for compiler directives. I just think that it needs to be formalized at some point and the rules specified.

I would just point out that we made the decision to use //go: to mark Go toolchain directives when
we added (the limited use) //go:nointerface annotation back in 2012.
We added //go:noescape for assembly authors in 2013.
We added //go:generate in 2014.
We are likely adding //go:build in 2020-2021 as well.
There are others; that's just the highlights.
You can think of //go: as meaning #pragma from C if it helps.

At this point, the convention is very well established.
We chose that syntax back in 2012 because
(1) it's obviously not a comment for a person;
(2) tools that don't know about the comments will ignore them because they're comments; and
(3) it generalizes to other tools (s/go/yourtool/).

And as Ian said, #37974 formalized the exact generalized comment syntax, for what that's worth.

Based on the discussion above, this seems like a likely accept.
(Again, assuming but separate from the FS proposal.)

No change in consensus, so accepted.

commented

I'm eager to get my hands on embed- can this already be tested on master or are there any plans to ship it as experiment during the 1.15 cycle?

@andig, Go 1.15 is out already. I still hope this will be in Go 1.16 and landing in the development branch this month.

commented

@rsc 1.16 available?

@septs, no we are still working on Go 1.16. The code freeze is Oct 31, with a target release date of Feb 1.

commented

fastest 2021Q1 or 2021Q2 release?

@septs please stop asking questions about Go releases in this thread. Over twenty people follow it and get notified. See https://golang.org/wiki/Questions and https://github.com/golang/go/wiki/Go-Release-Cycle.

Change https://golang.org/cl/243941 mentions this issue: go/build: recognize and report //go:embed lines

Change https://golang.org/cl/243940 mentions this issue: go/build: refactor per-file info & reader

Change https://golang.org/cl/243942 mentions this issue: embed: implement Files

Change https://golang.org/cl/243944 mentions this issue: cmd/compile: add //go:embed support

Change https://golang.org/cl/243945 mentions this issue: cmd/go: add //go:embed support

One detail that came up in the implementation review is that "Files" as a singular noun is pretty awkward ("A Files holds ...").

The choice of embed.Files for the name predated both the io/fs proposal and also the support for string and []byte.
Given both of those developments, one seemingly sensible way to resolve the "A Files holds" problem is to call it an FS instead of a Files.

Then the three ways to embed & print data are:

import "embed"

//go:embed hello.txt
var s string
print(s)

//go:embed hello.txt
var b []byte
print(string(b))

//go:embed hello.txt
var f embed.FS
data, _ := f.ReadFile("hello.txt")
print(string(data))

That seems clearer about what you get: a string, a []byte, or an FS.
That is, most of the functionality of the embed.F* comes from it being an fs.FS, and calling it FS makes that clearer than calling it Files.

I've made that change in my latest draft of the CL implementing package embed, but I wanted to circle back here and see if there are any objections to the name change.

(A more radical change would be to do var f fs.FS instead of var f embed.FS, but that would preclude ever having any method on f other than Open. For example, above, having ReadFile is convenient and would not be possible. And in general we've learned that using a concrete type for something that might want to add methods later is good future-proofing compared to using an interface type directly.)

I think the rename is a good change.

In regards to the more radical change:

  • If we'd use fs.FS, would we even need the embed package anymore? I guess the dynamic value still must have some type, that lives in some package? I find the idea of not having to add a package a plus.
  • I don't find "we can't add methods" super convincing, because IMO f.ReadFile(…) isn't significantly lessmore convenient than fs.ReadFile(f, …).
  • I agree that concrete types are better in general though, so that's a plus for keeping it embed.FS
  • Another question: Does embed.FS use pointer-receivers, or value-receivers? IMO having to pass around &f is awkward, using value receivers is slightly unexpected. We might also allow var f *embed.FS though. If the variable has an interface-type, this question goes away.

Overall, I still agree using the concrete embed.FS is better - if nothing else, then for documentation purposes.

Now that you've mentioned it, I don't think I got this clear: we can embed directories right?

Yes as an embed.FS which implements fs.FS.

@Merovius, embed.FS uses value receivers. embed.FS is a one-word struct containing a single pointer, so there's no actual overhead to doing so, but it means you can assign them around and use them without worrying about *s and &s everywhere.

@chabad360, yes, you can embed directories.

What about symlinks?

@burik666, see https://golang.org/s/draft-embed-design for details, but no, you cannot embed a symlink.

will it be possible to embed and use dynamic C libraries? if so how would we use the embed path in #cgo headers such as: #cgo LDFLAGS: -L./lib -lmylib -Wl,-rpath=./lib ?

@benitogf I assume the only real way to do this would be to write them to disk and use dlopen. I can't imagine how you could tell the dynamic loader how to find embedded files. Also, if you want to bundle in C code, static linkage would seem more appropriate anyway, no?

@benitogf Embedding lets you put a file from disk into a []byte in your program conveniently, nothing more.
If you have a way to use a dynamic C library that is already in your program in the form of a []byte, then embedding will help you get a disk file there. Otherwise, no.

static linkage would seem more appropriate anyway, no?

@Merovius agreed, but I have several use cases working with vendors who will only provide dynamic libs

If you have a way to use a dynamic C library that is already in your program in the form of a []byte
the only real way to do this would be to write them to disk and use dlopen

writing the embedded library from the []byte to filesystem and using dlopen seems ok, though having the embedded files optionally "dumped" to filesystem on build/run so the #cgo header can access them would be useful, not only for cgo imho

Trying this out now; one wart with the go:embed directive is that if I embed build/*, the filenames still have the prefix build/. If I want to then serve that directory via http.FS, there's no easy way to add the prefix that's required to access them if needed (without writing a wrapper, which then hits the problem of needing to list out every potential method that the FS may have...).

e.g.:

//go:embed build/*
var buildDir embed.FS

// Serve some SPA build dir as the app; oops, needs to be build/index.html
http.Handle("/", http.FileServer(http.FS(buildDir)))

// or

//go:embed static/*
var staticDir embed.FS

// Oops; needs to have a static prefix.
http.Handle("/static/*, http.StripPrefix("/static", http.FileServer(http.FS(staticDir))))

// Could be this, but only because the prefix happens to match:
http.Handle("/static/*, http.FileServer(http.FS(staticDir)))

I know the intent is that one could write go:embed foo/* bar/* baz.ext and get all of those files, but I think it's going to be very common to simply embed a directory and serve it as static assets via the http package. I expect this to be a gotcha as people switch from things like http.Dir("static") or pkger.Dir("/internal/web/static") where the prefix is already handled, to the new embed.FS.

I'm not really sure how to file this, as it's sort of an interplay with embed, io/fs, and net/http.

@zikaeroh Writing a http.Handler wrapper would also work there right? That's only ever one method and there's even http.HandlerFunc. Perhaps the standard library could even provide one to mirror http.StripPrefix (something like http.AddPrefix or http.ReplacePrefix).

Potentially, though it feels a bit weird to be modifying the HTTP request to work around the FS implementation (as opposed to a generalized "give me an FS that is a subdir of another FS", which is not simple with optional methods). It wouldn't be the most efficient thing, stripping then adding another prefix again (given http.Request copies), but I'll give it a try later. It's at least not different than the current scheme of things where you have to work with the request, I suppose.

I have a few other places I use static data not via the http package that I'll have to have to come up with a similar fix for.

If I have to have a look at how it is being implemented where can I look. A branch where it is being implemented?

commented

It was suggested before to embed the files in-place, i.e. do that in the build directory and then import it. That would strip the build prefix. Then use the handler to add the required prefix. I‘m unsure how to exclude the go file doing the embedding from embedding itself. See #41191 (comment)

It was suggested before to embed the files in-place, i.e. do that in the build directory and then import it. That would strip the build prefix. Then use the handler to add the required prefix. I‘m unsure how to exclude the go file doing the embedding from embedding itself. See #41191 (comment)

That's unfortunately no good for directories produced by other tooling, e.g. the output of say a webpack build or CRA (where they're often cleaned beforehand, and not checked in). I'd rather hack the filenames.

It was suggested before to embed the files in-place, i.e. do that in the build directory and then import it. That would strip the build prefix. Then use the handler to add the required prefix. I‘m unsure how to exclude the go file doing the embedding from embedding itself. See #41191 (comment)

That's unfortunately no good for directories produced by other tooling, e.g. the output of say a webpack build or CRA (where they're often cleaned beforehand, and not checked in). I'd rather hack the filenames.

If you use such a huge system of plugins as webpack it's easy to just install another webpack plugin to generate the embed.go along the assets themselves. If you are using something simpler with a makefile or shell script it's also easy to generate the .go file from there.

@zikaeroh

as opposed to a generalized "give me an FS that is a subdir of another FS", which is not simple with optional methods

It's supposed to be simple. Handling the wrapping problem was part of the design process. In particular, the wrapper is supposed to implement all the optional methods, by just calling the appropriate helper function in fs. If that doesn't work, that's concerning and it would be great to get some details.

Also, IMO, an implementation of such a wrapper (that strips a prefix) should ultimately be provided by io/fs (analogous to io.LimitWriter, etc). I would assume the only reason it hasn't happen yet is time.

@andig The problem with doing that is that the Go-file containing the embed-directive and the variable is then also visible from the FS (and would be served by HTTP or might get exposed in another manner).

commented

The problem with doing that is that the Go-file containing the embed-directive and the variable is then also visible from the FS (and would be served by HTTP or might get exposed in another manner).

One way to remedy that could be to add the ability to exclude specific files/folders from embedding (@rsc ?)

One way to remedy that could be to add the ability to exclude specific files/folders from embedding (@rsc ?)

The proposal was accepted over a month ago and is already implemented; I don't think large design changes like being able to exclude paths are reasonable at this point. If you have a problem with the implemented design that you can't work around, I would suggest filing a separate bug report with details, which can then be tracked before the final 1.16 release.

@Merovius

It's supposed to be simple. Handling the wrapping problem was part of the design process. In particular, the wrapper is supposed to implement all the optional methods, by just calling the appropriate helper function in fs. If that doesn't work, that's concerning and it would be great to get some details.

How would the prefix stripping work for Glob?

@icholy I assume something like

func (f *stripFS) Glob(pattern string) (matches []string, err error) {
    matches, err = fs.Glob(f.wrapped, path.Join(f.prefix, pattern))
    for i, m := range matches {
        matches[i] = strings.TrimPrefix(m, f.prefix+"/")
    }
    return matches, err
}

Maybe with some additional care.

Playing with this on gotip, I notice that it will include .DS_Store files. This is pretty much inescapable, I guess, but I do worry that including dot files will lead to accidental file inclusion. Maybe the docs should have a strong warning about this?

My shell doesn't include dot-files in *, so if I want to include them, I have to use * .*. This might be a (perhaps equally surprising) way to give a level of control.

I'm not sure what to think - IMO, dot-files shouldn't really be treated differently by the pattern, but OTOH the .DS_Store example seems like something that really should be addressed.

Why not just do git clean -dffx && go build? If the DS_Store files are in git, then they'll be included in the build. If they're not, they won't be. This will also work nicely with gitignore.

You should be building with a clean VCS checkout, anyway. If you add random temporary files, they might make it into the final build and you can't possibly know what files people will end up with. We might want to document this.

@mvdan I agree in principle, but in practice, I worry that a lot of people will get burned by dirty builds if they are not warned. I don't want to see a secret leak someone didn't realize their .env file was embedded by mistake. I have seen tons of examples of an equivalent mistake with PHP hosting, even though it could have been easily prevented by telling Apache to exclude dot files.


Re: embeding http.FileServers

If you use such a huge system of plugins as webpack it's easy to just install another webpack plugin to generate the embed.go along the assets themselves.

That's true, but it's very fiddly. The point of embed.FS is to reduce the need for crazy Makefile workarounds. I think the simple solution is to have fs.WithPrefix(string) fs.FS that locks an FS into a subdirectory. I thought there was some discussion of this in the proposal, but I can't find it now. Maybe it was just mooted on Reddit or something.

One way to remedy that could be to add the ability to exclude specific files/folders from embedding (@rsc ?)

It could be something like

//go:embed static
//go:embed-exclude .*
var staticFiles embed.FS

The embed-exclude directive could just do a glob filter on the accepted files and remove any matches…

If we want to avoid adding more to this proposal it could also be a lint rule that checks embedded filesystems for potentially unexpected dotfiles and warns you so that you can fix your build to remove them.

commented

Or exclude dot files by default unless they are specifically mentioned by adding .* or similar.

That still wouldn‘t take care of exposing an assets.go file. As for the proposal already having implemented, please note that the question about the asset generation was raised during the discussion phase. It‘s probably not dangerous to have an (otherwise empty, except for the embed directive) assets.go exposed but it would be cleaner to not have it. As usual there are all sorts of workarounds that can be applied.

It‘s probably not dangerous to have an (otherwise empty, except for the embed directive) assets.go exposed but it would be cleaner to not have it.

I agree that this is very unlikely to be a problem, but I would hate to see any closed source code accidentally leaked due to misconfiguration if we can make it easy to configure things correctly.

I don't want to see a secret leak someone didn't realize their .env file was embedded by mistake.

If someone uses //go:embed static/* and there's a static/.env or static/.super-secret, wouldn't you say that the user really meant to include those files? Otherwise, why would they be in the static directory?

I get that this is up to what the user expects and what * means in most contexts, but I personally think that https://golang.org/pkg/path/filepath/#Glob semantics is our only good option. It's the simplest and what most Go users will be used to in the context of developing Go.

I think warning against the dangers of embedding * is a good idea in any case, though, because in a significant amount of cases one could reduce the chance of error by using more specific globs like *.png.

Also, I still encourage you to file a separate issue that can be tracked against the 1.16 release, written from the point of view of a bug report. This proposal is accepted and implemented, so I imagine it will be closed very soon. For example, you could phrase the bug report like: embedded file support easily leads to including unintended files (and give some examples).

If someone uses //go:embed static/* and there's a static/.env or static/.super-secret, wouldn't you say that the user really meant to include those files? Otherwise, why would they be in the static directory?

There is a huge amount of corner cases, for example you opened an edit session with vim, but didn't close it, and it created .*.swp file with some contents that you would like nobody to see.

Moving discussion to #42321.

This would be highly appreciated by the Prisma team for our Database Go client, since we use a query engine at runtime which is written in rust and needs to be in the built go binary somehow.

The way we're currently doing it is packing the binary into a .go file at go:generate time, but the file size is much higher than when it's a binary .gz file.

Native embeds would make this much better so we could directly embed a .gz file into the final go binary.

commented

If someone uses //go:embed static/* and there's a static/.env or static/.super-secret, wouldn't you say that the user really meant to include those files?

No, I would not.

$ mkdir z
$ touch z/.secret z/intended
$ ls z/*
z/intended
$ ls z
intended

See my later comment in #42328 (comment).

I love the idea to make static files / templates embeddable which definitely can boost go dev productivity a lot.

But should we innovate another tag (e.g. @ or whatever) other than to reuse this //, which is supposed to be comments?

As of now, the // has been overused I guess, think of these:

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:webhook:verbs=create;update,path=/validate-batch-tutorial-kubebuilder-io-v1-cronjob,mutating=false,failurePolicy=fail,groups=batch.tutorial.kubebuilder.io,resources=cronjobs,versions=v1,name=vcronjob.kb.io
// +optional
...

But should we innovate another tag (e.g. @ or whatever) other than to reuse this //, which is supposed to be comments?

That's a separate discussion and while it would be nice for directives not to be comments, a lot of go1 compat code already relies on that, so it's most likely not going to change.

But should we innovate another tag (e.g. @ or whatever) other than to reuse this //, which is supposed to be comments?

I tried finding it and failed, but I remember there being a decision to standardize on //go:<word> for these kinds of directives.
It doesn't seem like a good idea to change syntax around, given the decision to expressly converge on them.
(Also, of course, they are comments specifically so that the compiler ignores them - these directives are specific to the go tool, so they shouldn't leak into the language proper)

I saw a mention on the io/fs issue that embed.FS supports ETag: #41190 (comment)

I tried running a test for it but I'm not seeing an ETag response header set. Maybe I'm misunderstanding the usage. Should I expect to see one here? https://play.golang.org/p/Wq5xU5blLUe

I don't think it does. http.ServeContent (used by http.FileServer) inspects the ETag header, but it does not set it, AIUI.

In a comment above Russ has said that ETag will be made to work. The difficulty is how to have embed.FS communicate to http.FileServer the necessary information to set ETag or other caching headers. There probably should be a separate tracking issue for this.

Personally, I'd argue that embed.FS should use the time of the last commit of the relevant module as a ModTime. This would roughly correspond to debug.BuildInfo, so it wouldn't impact reproducibility. Though I'm unsure how that is set for commits that don't correspond to a tagged release and it would still leave the question for what to set it to for builds from dirty worktrees.

But, I trust @rsc has a good solution in mind :)

commented

I'm not sure I understand the comment about "commit time"; if module source is a zip file, there's no "commit". I don't see any times mentioned in debug.BuildInfo or debug.Module.

More importantly, I'd argue that any timestamp-based mechanism is strictly inferior to a proper (hash of content based) etag.

@tv42 Every module version is either a) a semantic version derived from a tag (which points at a commit) or b) a pseudo-version containing a commit-hash. I think? At least in git. I may be misunderstanding something.

More importantly, I'd argue that any timestamp-based mechanism is strictly inferior to a proper (hash of content based) etag.

I'm not so sure. It either needs some side-channel to communicate the hash, or the server needs to compute the hash of the file on request (which seems pretty expensive). After all, net/http doesn't know, a priori, if the fs.FS contents may change. The end-result of a hash-based ETag might justify the cost of adding such a side-channel (like an optional interface), but it doesn't seem obviously strictly better.

Also, I would argue that supporting at least also a time-based approach would mean you can work with more clients. I don't have any data to support that, though (i.e. I don't know if and how many clients are out there that might support If-Modified-Since but not ETag and vice-versa).

But really, I don't care much which approach is chosen. I just wanted to mention the option of using the time a module version was tagged.

commented

@Merovius Go modules are specified in terms of zip files. The fact that Git is a common source used for constructing that doesn't change the spec. The zips have zero timestamps:

$ unzip -v ~/go/pkg/mod/cache/download/golang.org/x/crypto/@v/v0.0.0-20200510223506-06a226fb4e37.zip|head
Archive:  /home/tv/go/pkg/mod/cache/download/golang.org/x/crypto/@v/v0.0.0-20200510223506-06a226fb4e37.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
     345  Defl:N      233  33% 1980-00-00 00:00 237856c8  golang.org/x/crypto@v0.0.0-20200510223506-06a226fb4e37/.gitattributes

There does seem to be a timestamp in the accompanying *.info file, but I have no idea if that's reliable or available to the tooling.

$ cat ~/go/pkg/mod/cache/download/golang.org/x/crypto/@v/v0.0.0-20200510223506-06a226fb4e37.info; echo
{"Version":"v0.0.0-20200510223506-06a226fb4e37","Time":"2020-05-10T22:35:06Z"}

Even then, what timestamp should go:embed use in the main module (the one you run go build in)?

Personally, in the context of static files that are immutably embedded in the binary, hashing just seems superior to timestamps in every respect (except legacy support from before ETag existed, pre-HTTP/1.1, the 90s), and has no possible source of confusion in distributed systems and concurrent builds either. I seriously doubt clients that don't understand ETag exist anymore, and they sure won't make the HTTP/2 transition.

A side-channel to communicate the hash sounds like the right thing to me; an optional FS interface to return a hash of the file. (And maybe another to request a specific hash algorithm, if they happen to have it?. Some serving uses cases really specifically need sha256/sha1/md5, not just "some hash"). An FS that doesn't have a cheap enough answer can choose to not implement it.

(Though modern hashes are gigabytes/second/core even when cryptographically secure, tens of gigabytes/second for less secure ones, and easy to cache based on stat call. Just support ETag everywhere, people.)

I've been thinking about this proposal today. I'm happy to see this functionality included in the Go toolchain, and I appreciate all the thought and effort that has gone into the proposal to date. Thanks for proposing this and driving this forward.

I have two questions, and then one suggestion (appreciating this is past the proposal discussion period, and already into release freeze):

Allowed types

I notice that the current code expects the variables to be declared as precisely embed.FS, string, or []byte. In particular, these are not allowed: []uint8, FS after dot-import of "embed", or any other identical types constructed using type aliases. (Edit: I forgot how dot imports work within gc, and misread the code for detecting embed.FS.)

Is this intended or a bug?

Semantics of []byte-typed variables.

I don't see any mention of what the expected identity semantics of []byte-typed variables are. In particular, for function-scoped variables. This isn't important for string- and embed.FS-typed variables, because they reference immutable data that can be safely deduplicated. But it's important to know the intended semantics for []byte-typed variables.

With the current implementation, the test program below prints false true (when foo.txt is non-empty). Is that intended/guaranteed?

package main

//go:embed foo.txt
var a []byte

//go:embed foo.txt
var b []byte

func f() *byte {
    //go:embed foo.txt
    var x []byte
    return &x[0]
}

func main() {
    println(&a[0] == &b[0], f() == f())
}

I think the more Go-like semantics for //go:embed variables would be those of composite literals: that each execution yields a new copy.

If there's no consensus on the proper semantics for this, we can always punt on it and make function-scoped, []byte-typed embeds an error for Go 1.16: users can still either declare a package-level variable if they want the current semantics (one byte slice per source declaration), or use a string-typed variable and convert to []byte if they want composite-literal semantics. We could then revisit later which behavior users would benefit from more.

Avoiding more //go: directives

I recommend against adding //go: directives for end-users that impact program semantics, and I don't find the arguments given for favoring //go:embed over normal Go syntax compelling. I respectfully encourage reconsidering this decision before the Go 1.16 release. (Again, I appreciate how late this request is.)

I'll start by pointing out I have a working proof-of-concept CL at CL 276835, which changes the embed API from:

//go:embed foo.txt bar.txt
var x embed.FS

to

var x = embed.Files("foo.txt", "bar.txt")

Similarly, functions embed.Bytes and embed.String are available to embed a single file and get it as a []byte or string value.

Responses

Similarly, an embed.Files variable can be a global or a local variable, depending on what is more convenient in context.

Having embed.Files, etc. allows also using them in expression contexts, which may be more convenient still.

It is an error to use //go:embed in a source file that does not import "embed" (the only way to violate this rule involves type alias trickery).

With embed.Files, etc., this is an error by virtue of the functions being exported by package embed.

Goimports (and gopls etc) can be taught this rule and automatically add the import in any file with a //go:embed as needed.

No special goimports or gopls logic is needed to be aware that importing "embed" is the proper way to fix uses of embed.Files, etc.

This approach fixes the type-checking problem—it is not a full language change—but it still has significant implementation complexity.

Notably, CL 276835 is a net removal of code. In particular, the compiler code (which will have to be reimplemented in gccgo and other compilers) is much simpler.

I also expect it will be easier to teach go/types about the nuanced semantics of embed.Files, etc. (i.e., that they only accept string literal arguments), than to teach it about //go:embed.

The go command would need to parse the entire Go source file to understand which files need to be made available for embedding. Today it only parses up to the import block, never full Go expressions

With CL 276835, the behavior is the same as at tip: the go command parses the entire Go source file for files that import package embed, and only the imports for files that do not.

Admittedly, for files that import embed, CL 276835 does a full parse and walk, whereas tip does a more efficient string scan for //go:embed comments. I think a more optimized, one-pass algorithm for finding embed.Files calls is doable though if desired.

It would also be unclear to users what constraints are placed on the arguments to these special calls: they look like ordinary Go calls but they can only take string literals, not strings computed by Go code, and probably not even named constants (or else the go command would need a full Go expression evaluator).

This doesn't seem substantially different from the //go:embed directive to me: users don't have any expectation what arguments they're allowed to use there until they use it the first time. Moreover, either way, users will get compiler error messages if they use it incorrectly, but IDEs and other tools will automatically provide better godocs and cross-references for embed.Files, etc.

@mdempsky FS after a dot-import of embed is the same type. So, that particular case seems a straight-forward "yes, that's fine" (tried it and it already works). Likewise, byte is an alias for uint8, so []uint8 is also the same type as []byte, though somewhat surprisingly, this doesn't work right now. However, I'd argue that the semantics as implemented are fine, for now - we can always allow more types and/or aliases and/or even "the same underlying type" later, if the need arises.

I think the more Go-like semantics for //go:embed variables would be those of composite literals: that each execution yields a new copy.

I tend to agree, that was my expectation as well. At first I thought this might've been an artifact of escape analysis and the compiler realizing you don't change the data, but no:

func f() {
	//go:embed foo.txt
	var x []byte
	fmt.Printf("%q\n", x)
	x[0] = 'x'
}

func main() {
	f()
	f()
}

However, "every var declaration creates a new variable" also has its problems, because it means code like this

func ServeIndex(w http.ResponseWriter, r *http.Request) {
    //go:embed "index.html"
    var x []byte
    http.ServeContent(w, r, "index.html", time.Now(), bytes.NewReader(x))
}

would needlessly allocate and copy. Perhaps that's fine and gets outweighed by the benefits of clearer semantics. But perhaps it's also a sign to forbid local []byte embeds as you mention.

I also expect it will be easier to teach go/types about the nuanced semantics of embed.Files, etc. (i.e., that they only accept string literal arguments), than to teach it about //go:embed.

Only string constants at least, can even be done in the Go type-system.

But, does go/types even need to know about embedded files, with //go:embed? The type of those variables are pretty clear, after all (with the exception of local []byte embeds, as discussed).

commented
//go:embed "foo"
var x []byte

Is this really meant to be mutable? I can't think of a single use case where I would want to embed a static asset, but mutate it at runtime, so I feel like that'll just enable bugs. I would have been happier shoving it in .rodata and panicking if something tries to mutate it.

(Prompted by x[0] = 'x' above.)

@Merovius wrote:

@mdempsky FS after a dot-import of embed is the same type. So, that particular case seems a straight-forward "yes, that's fine" (tried it and it already works).

Thanks. I forgot how dot imports work within the compiler, so I misread the code. I should have tried first before including that example.

However, I'd argue that the semantics as implemented are fine, for now - we can always allow more types and/or aliases and/or even "the same underlying type" later, if the need arises.

[]byte and []uint8 are identical types under the Go spec, as are countless other types constructed using type aliases. If it's not acceptable for a compiler to treat embed.Files("foo" + ".txt") differently from embed.Files("foo.txt"), then the same argument should extend to not allowing it to treat aliased types differently.

But, does go/types even need to know about embedded files, with //go:embed?

No, it doesn't need to, just like it doesn't need to know about embed.Files's special semantics either. But for either syntax, there could still be value in go/types being able to warn about misuse.

--

@tv42 wrote:

Is this really meant to be mutable?

Evidently. The Go compiler specifically arranges for string and embed.FS embeds to be deduplicated and put in rodata, while []byte embeds are not:

fsym, size, err := fileStringSym(v.Pos, embedCfg.Files[file], kind == embedString, nil)

However, it's true that the proposal does not seem to have addressed this point.

Thanks for the thoughtful comments. I've filed two issues to follow up:

#43216 - remove support for embedding directives on local variables
#43217 - define String and Bytes type aliases that must be used with //go:embed

I didn't file an issue for the //go:embed syntax itself. I wanted to say why here, so that it doesn't look to everyone like I'm just dismissing that.

Back when I blogged about the proposal process I spent a long time looking at ways to (and not to) handle decisions in large groups. One source I found that made a lot of sense to me was John Ousterhout's “Open Decision-Making” post. The whole post is worth reading, but I'll just quote here the section on Reconsideration:

The last rule for open decision-making is to make sure you don't reconsider a decision unless there is significant new information. This rule has two parts. First, you must be prepared to correct decisions that turn out to be wrong in a significant way. This is particularly true in startups: many decisions have to made without complete information and inevitably some of them will turn out to be wrong. A wrong decision that is corrected quickly causes little or no damage, but a wrong decision that is not corrected can be catastrophic.

On the other hand, you should not reconsider a decision unless significant new information has come to light since the original decision was made. If no new information is available, then reconsidering a decision will probably produce the same outcome as the original decision, wasting everyone's time. It's not unusual for employees to come to me weeks after a decision was made: "John, I voted against the decision to XYZ, and the more I think about it the more convinced I am that it was wrong; I really think we need to reconsider this." My answer is "What new information do you have?" If the answer is "none", we don't reconsider. In situations like this it helps to have a record of the arguments presented during the discussion, so you can verify that new information really is new. If you make it too easy to reopen decisions you end up vacillating back and forth where no decision is ever final and employees hesitate to implement decisions because they don't believe they are permanent.

If you want to make sure you don't reconsider very many decisions, be sure to gather input widely during the decision-making process. If you don't get enough input, you increase the likelihood that significant new input will arise after the decision, which means you'll have to reconsider. If you do a good job of collecting input it's much less likely that you will have to revisit your decisions.

This really resonated with me: the Go proposal process (like large open source projects generally) is a system with a far higher offered load than work capacity, so it's important that we (disagree if necessary and) commit and move on to the next decision.

string and []byte were added fairly late in the process of considering this issue, in response to initial feedback, and we clearly didn't think through all the implications of those. So I filed those two new issues, #43216 and #43217.

On the other hand, the //go:embed syntax was a core part of the original discussions and was extensively discussed, both pros and cons. I don't believe there's “significant new information” that should cause us to reconsider that syntax entirely, so in the interest of moving forward and keeping Ousterhout's advice on Reconsideration in mind, I'm leaving that to the side.

Thanks again for pointing out the string and []byte issues!

Evidently. The Go compiler specifically arranges for string and embed.FS embeds to be deduplicated and put in rodata, while []byte embeds are not:

@rsc did you think about this last point by any chance? With the current implementation, it might become "best practice" to use string instead of []byte for the sake of producing better binaries. Are we OK with that?

I don't understand why that would be a "best practice". To me, that's akin to saying it's "best practice" to make sentinel errors constants, ergo we shouldn't allow package-scoped variables of type error - in that I disagree both with that being a good practice and with the additional restriction as a solution.

I could see an argument to only use string in local vars. But in package-scoped variables, the semantics are clear and well-defined and I wouldn't recommend against using a []byte embed anymore than I would recommend against using any other []byte variable.

@mvdan, if folks bias toward using string instead of []byte as a best practice, I would consider that a good thing. string is the appropriate Go type for “an immutable string of bytes”, whereas []byte is the appropriate type for “a mutable string of bytes” or “a string of bytes of indeterminate mutability”.

For the cases where you have a value of type string (immutable) and want to use it as type []byte (indeterminate), you can already use unsafe to do that correctly. (See, for example, my unsafeslice.OfString). Perhaps we should add a supported standard library for that operation, but that seems like a separate proposal.

So it seems fine to always use the type string if you really intend for the value to be read-only.

@Merovius @bcmills you raise good points, and to be clear I don't object. I just want to make sure the proposal designers think about this distinction before the final release.

I really don't think deduplication is going to come up much in practice. (In what setting are multiple packages going to embed the same exact files?) People should use the form that they need and not worry about "string means my binaries are smaller", because in general I don't think that will be true.

A few people asked about ETags. We ran out of time for that, but I've filed a proposal at #43223 and hopefully that will lead to a good idea that can go into Go 1.17. Sorry for not being able to get it in this round.