microsoft / go

The Microsoft build of the Go toolset

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mingw erros with specific golang versions

kiashok opened this issue ยท comments

mingw has proven to have issues with certain golang versions in the past. We have see this in particular with containerd.
golang/go#61058

Example of errors we are currently seeing on containerd with golang 1.21.3:
https://github.com/containerd/containerd/actions/runs/6748357428/job/18508691738?pr=9288

image

Some dll is possibly failing to load.

The request/proposal via this github issue is to ask for testing golang versions with mingws on WS 2019 as it does not exist today.
This would catch possible issues early on and help improve user experience while using windows OS.

For context, the windows2022 image we use is defined here:

# Pick images. https://helix.dot.net/#1esPools
demands:
${{ if parameters.public }}:
windows: ImageOverride -equals 1es-windows-2022-open
${{ else }}:
windows: ImageOverride -equals 1es-windows-2022

Those are internally based on the GitHub Actions environments, so we should be able to get the same setup. We updated from 2019 to 2022 a while ago to fix some issues including (it seems) this same one:

So, hopefully by adding a 2019 builder we run into the issue again and can get to a working pipeline we can maintain for other teams to look at. (Ideally, we fix things on the Go, Windows, MinGW, or GCC sides, but I'm not sure how much of that is feasible vs. finding MinGW/GCC versions that work with specific versions of Go.)

With 2019 an important platform for teams inside Microsoft, it's probably worth officially making it work and keeping a builder online.

I'm trying out a Windows Server 2019 VM with:

  • Official go1.21.3
  • choco mingw v12.2.0.3042023
  • Git bash for grep
  • GOTEST='gotestsum --'
  • CGO_ENABLED=1
  • Running mingw32-make.exe test root-test
  • containerd 45d7f2324d

I do get some failures (seeming to be caused by permissions, I didn't think to start this as root/admin), but no repros for exit status 0xc0000139.

Running a dist test also doesn't entirely work, but doesn't show this particular error, only errors that are related to what each test is doing and I'd probably also expect on Windows 2022 in similar conditions.

I also kicked off https://dev.azure.com/dnceng-public/public/_build/results?buildId=466413&view=results to try out windows2019 in our infra, which might be closer to what you're running because it's also based on the GitHub environment.

If that doesn't turn anything up, I think we need to dig deeper to find the cause of this particular issue. @kiashok are you able to get direct access to a VM in this original state, or have you been able to repro locally in some other way? Otherwise, I suppose we need to drill down into making CI detailed enough to spot what's different.

I didn't code the initial attempt to run on windows2019 quite right, but I fixed it up and rebased on 1.21 for stability, and it repros the issue you're seeing in the race detector tests: https://dev.azure.com/dnceng-public/public/_build/results?buildId=476700&view=logs&j=0c84acf0-de1f-5568-6f11-9c6882ddbb1b&t=be974a80-10e5-5028-05e6-4c2e5742ae5b&l=458

##### Testing race detector
exit status 0xc0000139
FAIL	runtime/race	0.022s
FAIL
2023/11/21 07:48:59 Failed: exit status 1

I'm still not able to repro locally. @gdams is planning to go on a quest to get us a proper repro vm. ๐Ÿ˜„

In the meantime, I'll add more logs and do reruns to try to pin down differences between my local vm and GitHub actions. It's good that at least this isn't unique to your CI.

I didn't code the initial attempt to run on windows2019 quite right, but I fixed it up and rebased on 1.21 for stability, and it repros the issue you're seeing in the race detector tests: https://dev.azure.com/dnceng-public/public/_build/results?buildId=476700&view=logs&j=0c84acf0-de1f-5568-6f11-9c6882ddbb1b&t=be974a80-10e5-5028-05e6-4c2e5742ae5b&l=458

##### Testing race detector
exit status 0xc0000139
FAIL	runtime/race	0.022s
FAIL
2023/11/21 07:48:59 Failed: exit status 1

I'm still not able to repro locally. @gdams is planning to go on a quest to get us a proper repro vm. ๐Ÿ˜„

In the meantime, I'll add more logs and do reruns to try to pin down differences between my local vm and GitHub actions. It's good that at least this isn't unique to your CI.

Thank you for looking into this!

The win2019 agent appears to get MinGW directly from SourceForge (https://github.com/actions/runner-images/blob/69db5c6c63cee6e97ea5b7d9e0b7318a28a2094c/images/windows/scripts/build/Install-Mingw64.ps1#L6-L33), so it's appearing first in PATH vs. the Chocolatey install. I'm able to repro the issue locally with the latest build on SourceForge, gcc.exe (x86_64-posix-seh-rev2, Built by MinGW-W64 project) 12.2.0. If I build an exe and then run it, I get this dialog with more detail than the stdout:

---------------------------
example-broken.exe - Entry Point Not Found
---------------------------
The procedure entry point WakeByAddressSingle could not be located
in the dynamic link library C:\Users\dagood\Desktop\git\example\example-broken.exe. 
---------------------------
OK   
---------------------------

I found a MinGW discussion about this API: https://sourceforge.net/p/mingw-w64/mailman/mingw-w64-public/thread/20200605082143.20-1-robux4%40ycbcr.xyz/#msg37029994. It seems it was moved at some point, and MinGW applied a change in 2020 to adjust.

The latest precompiled build on SourceForge was updated in 2018. https://sourceforge.net/projects/mingw-w64/files/Toolchains%20targetting%20Win64/Personal%20Builds/mingw-builds/

Two weeks ago, the SourceForge builds were removed from https://www.mingw-w64.org/downloads/ because they're outdated. mingw-w64/mingw-w64.github.io@9887fc6 โ—

I think we should try to find a repro that doesn't involve the Go toolchain (just building a C program) and then file an issue at https://github.com/actions/runner-images/issues to have them remove the toolset.


The Chocolatey MinGW works for me, locally. (Windows 2019 VM.) I think that to get around this, you just need to remove the outdated build (by either deleting it or manipulating PATH) and use what you get from Chocolatey.

You could do some extra diagnosis by adding gcc --version and Get-Command gcc.exe to your build to make sure this is what's going on. Printing the full PATH variable is always a good idea, as well.

Here's what we use in our builds to get rid of some unwanted tools provided by the base image, simply removing the binaries:

function RemovePathBinary($name) {
$src = (Get-Command $name -ErrorAction SilentlyContinue).Source
if ($src) {
Write-Host "Removing $src"
Remove-Item $src
} else {
Write-Host "Command not found: $name"
}
}
Write-Host "Removing pkg-config to skip cmd/go TestScript/list_pkgconfig_error on Windows."
RemovePathBinary 'pkg-config'

(Modifying PATH would be more gentle, but in my opinion it's more brittle and not worth it.)

I tried different Go versions:

  • go1.20 and go1.20.11 do not repro.
  • go1.21.0 through go1.21.4 all repro.

A basic C repro-attempt program compiles and runs fine with gcc .\main.c -o hello-world -lsynchronization:

#include <stdio.h>
#include <windows.h>

int main() {
    printf("Hello, world!\n");
    WakeByAddressSingle(&main);
    return 0;
}

It would seem Go 1.21 did in fact introduce some regression when also using this old version of MinGW. Comparing the resulting binaries to try to narrow down a difference seems like the next step if the goal is to make this work in Go 1.21 (or Go 1.22).

I question if that's worthwhile vs. dropping this 5-years-old MinGW build.

We can use this issue to track that question/effort, and I've opened #1085 to more generally start testing win2019+MinGW in our CI to try to spot this kind of thing ahead of time in the future.

I ran some experiments in our public CI, and had a tough time using Chocolatey. I think you might have noticed the same thing, because you're using --version 12.2.0.3042023 in your PR rather than the latest version 13.2.0. Here's what I saw, with the caveat that I'm not an expert with Chocolatey so I might have misunderstood something I saw:

  • 12.2.0.3042023 appears to generate shim binaries for each MinGW tool, making them accessible in the machine's PATH.
    • However, the win2019 image's SourceForge install is earlier in PATH, so the Chocolatey install has no effect.
  • 13.2.0 attempts to adjust the machine's PATH to point at the MinGW installation.
    • This doesn't seem to work because the CI process is running. Whatever Chocolatey is doing, it doesn't affect the current step or future steps. It seems focused on a developer workstation.

So, I prototyped a Go tool that can download and install various MinGW versions and automatically add it to PATH in the expected GitHub Actions and AzDO ways:

That PR has a GitHub Actions run that shows the fresh MinGW installation working. I'm not totally sure yet if we can commit to maintaining that MinGW tool for use in general CI, so I don't know if it makes sense to use it directly (yet).

Next steps for this issue are:

  • I'll post a comment on containerd/containerd#9288 with what I see as the current ways to work around this issue.
  • File an issue with the VM image maintainers to suggest updating this outdated MinGW build. (Then no change would be necessary for your CI.)
  • Finish microsoft/go-infra#95
  • Find why this repros in Go 1.21 but not 1.20.
    • I see this as a low priority: I expect it will have to do with golang/go@cc82867 and not be something that can be fixed.

containerd/containerd#9288 is merged! ๐ŸŽ‰ Closing as complete.

Windows 2019 is EOL as of today (strange coincidence), so we no longer plan to add CI for it. However, I filed #1105 to test various MinGW versions going forward.

I never got to the bottom of why exactly it happens in Go 1.21 but not 1.20, but it doesn't seem necessary.