golang / go

The Go programming language

Home Page:https://go.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

cmd/go: build cache does not check #included headers for changes

FlorianUekermann opened this issue · comments

go 1.10 linux/amd64

The go build and go test commands don't rebuild packages if included cgo files changed. I guess a solution would be to run the preprocessor first or disable caching for packages that use cgo altogether.

Can you show us a small standalone example? I'm not clear on what you mean by "included cgo files". Do you mean explicitly files included using #include?

Do you mean explicitly files included using #include?

Yes.

One possibility would be to use -MD when running the C compiler, and add the listed files to the hash.

use -MD when running the C compiler, and add the listed files to the hash.

Just to make sure I'm following. Did you mean add the contents of all listed files or just the list?

I had running with -E and hashing the output in mind, but I guess your idea may be more efficient (is it?).

I meant the contents of the listed files, as in cache.FileHash in cmd/go/internal/cache/hash.go. The idea is to be able to use the cache to detect whether we can skip running the compiler. If we use -E then we have to run the compiler anyhow to see whether the cache is up to date.

This is also affecting me in a more general sense that with 1.10 there is no way to model the build dependency to statically linked libraries anymore. If the library changes, the cached files are still reused and I silently end up using an older version of the library unless I do go clean -cache -i <package> before the build. With go versions previous to 1.10 I had cmake touching my cgo wrappers to have them rebuilt.

I think this is basically working as expected.
If you change the underlying C code, or you change the compiler,
then you have to rebuild with -a. I'll leave it open in case there is
a simple fix but I don't think there is.

There is a way to make it work with the help of gcc -MD & friends:

(hello.h)

#define WORLDNUM        3

(hello.c)

#include <stdio.h>
#include "hello.h"

int main() {
        printf("Hello world (%d)!\n", WORLDNUM);
}
$ gcc -c -MMD -MT cdeps hello.c
$ cat hello.d 
cdeps: hello.c hello.h
$ gcc -c -MD -MT cdeps hello.c 
$ cat hello.d 
cdeps: hello.c /usr/include/stdc-predef.h /usr/include/stdio.h \
 /usr/include/x86_64-linux-gnu/bits/libc-header-start.h \
 /usr/include/features.h /usr/include/x86_64-linux-gnu/sys/cdefs.h \
 /usr/include/x86_64-linux-gnu/bits/wordsize.h \
 /usr/include/x86_64-linux-gnu/bits/long-double.h \
 /usr/include/x86_64-linux-gnu/gnu/stubs.h \
 /usr/include/x86_64-linux-gnu/gnu/stubs-64.h \
 /usr/lib/gcc/x86_64-linux-gnu/7/include/stddef.h \
 /usr/include/x86_64-linux-gnu/bits/types.h \
 /usr/include/x86_64-linux-gnu/bits/typesizes.h \
 /usr/include/x86_64-linux-gnu/bits/types/__FILE.h \
 /usr/include/x86_64-linux-gnu/bits/types/FILE.h \
 /usr/include/x86_64-linux-gnu/bits/libio.h \
 /usr/include/x86_64-linux-gnu/bits/_G_config.h \
 /usr/include/x86_64-linux-gnu/bits/types/__mbstate_t.h \
 /usr/lib/gcc/x86_64-linux-gnu/7/include/stdarg.h \
 /usr/include/x86_64-linux-gnu/bits/stdio_lim.h \
 /usr/include/x86_64-linux-gnu/bits/sys_errlist.h hello.h

This way if there is something like

// #include "mycode.cinc"
import "C"

Cgo could see the dependency on mycode.cinc and other files mycode.cinc includes.

Sorry guys, I misclicked. Didn't mean to close. What @ianlancetaylor and @navytux suggest seems like a good fix to me.

I think this is basically working as expected.
If you change the underlying C code, or you change the compiler,
then you have to rebuild with -a.

While true, there's a hidden security subtlety here. Lets suppose there's a crypto library called go-crypto, which internally wraps the c-crypto project (random names). The devs of c-crypto find a fatal flaw, fix it and notify go-crypto, who update their vendored C code and issue a new release too.

I - as a user of the go-crypto library - see this and do a go get -u to fetch the new code, sleeping easy that I'm all protected. Except Go didn't bother to actually recompile anything because only the C code changes, so my binary is still vulnerable, even though I built it with the new code.

This same issue will happen arbitrarily high a dependency chain, where anyone forgetting to rebuild with -a could potentially be vulnerable.


Btw, I'm not saying I know how to fix this or whether it's even fixable. I just wanted to add a bit of weight behind this issue.

Just a quick note because nobody has mentioned solutions to the more general issue @rsc pointed out:

If you change the underlying C code, or you change the compiler, then you have to rebuild with -a.

This is going to cause confusing issues in practice. I doubt that everyone is aware of all packages that use C in some sub-dependency. Similarly a lot of people won't always know whether the compiler got updated recently.

As @karalabe points out this is a potential security risk. But it is also a general usability problem, as it may very well break builds or even the resulting binaries.

These problems seem pretty similar to the issues ccache and zapcc face. I don't know where this is documented for zapcc, but ccache has a few pointers here: https://ccache.samba.org/manual/latest.html#_common_hashed_information

In general I don't see much harm in hashing a little more of the environment ($CC -MD, relevant environment variables, $CC -v or the binary itself). I'm starting to doubt that this will ever be perfect, but a couple of safeguards could save a lot of people a lot of time and confusion.

commented

I think so too, a couple of safeguards could save a lot of people a lot of time and confusion.
If you change the underlying C code, or you change the compiler, then you have to rebuild with -a. This is going to cause confusing issues in practice.

After many people update the package code, they don't even know that the cache of go1.10 caused the bug to not be fixed.

commented

As a naive user, I got bitten by this, but at least it was really obvious: there was a bug in the glfw package's C code (caught by newer compiler, which issued a warning). so i changed the code, ran go build... same error. it was not at all obvious why it was giving me an error that couldn't possibly refer to any existing file on the disk, but apparently "-a" would have helped... but that's extremely non-obvious, and there's no reason that i should have to rebuild other unrelated packages to hint "actually, this package has a thing that has changed".

my actual quick workaround: a blank line in the .go file including the affected .c file.

We ran into this yesterday in our CI environment. I'm happy to look into fixing this for 1.14; the -MD solution seems like a good first step, but as @FlorianUekermann points out above, we may want to consider mixing in a bit more information.

I wonder if it is too late to consider adding info to cgo documentation for 1.13?

@dhobsd for 1.13, documentation is fine. Code changes not so much :)

Ah, actually I see that there is already material included about GOCACHE and cgo interaction in the go tool docs; I'll hold off until the 1.14 cycle is open to poke at fixing this.

So it seems like GOCACHE is simply not safe when linking to C at all.
Although, even the stdlib is linking C.
What happens if you update libc?

Ah, actually I see that there is already material included about GOCACHE and cgo interaction in the go tool docs

As I have trouble finding what you meant there, I'm gonna copy-paste it from https://golang.org/cmd/go/#hdr-Build_and_test_caching for future readers of this ticket:

However, the build cache does not detect changes to C libraries imported with cgo. If you have made changes to the C libraries on your system, you will need to clean the cache explicitly or else use the -a build flag (see 'go help build') to force rebuilding of packages that depend on the updated C libraries.

In other words, if you use Cgo, you MUST use go build -a and go test -a, otherwise you'll never know what ended up in your binary, or what C code you were actually testing.

Actually, go test -a does not seem to be enough to get the cache up to date. Subsequent go test runs without -a still use some older cached version of the C code.

The !!! enter message comes from C. No code was changed between the two test runs. First result is up to date, the second result is some older version of the C code:

$ go test -run Test_get_process_stats -a
!!! enter333
[...]

$ go test -run Test_get_process_stats
!!! enterYYY
[...]

This seems to fix it, so probably better to use this instead of (in addition to?) the -a flag:

go clean -cache -testcache .

It (may) be nice to have a mode that invalidates all cgo but not proper go.

After 6 years this issue is still open :(

Got it fixed with -a go build flag, but runs extremely slow :(
Could be helpful to have an option to invalidate cgo cache when any of related files was changed