godotengine / build-containers

Godot engine build containers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dockerfile.Windows doesn't build

TokisanGames opened this issue · comments

I started a new build of containers with ./build.sh 3.2 mono-6.6.0.166. What version of mono is currently being used for the official 3.2 binaries?

It fails when building windows. Initially, it hangs for a very long time after Cloning into '/root/mono/external/corefx'... or on corert. I've run it several times over multiple days and it hangs in one of those two spots. I left it overnight and it finally moved past it.

Cloning into '/root/mono/external/roslyn-binaries'...
Cloning into '/root/mono/external/rx'...
Cloning into '/root/mono/external/xunit-binaries'...
Cloning into '/root/mono/external/corefx'...
Submodule path 'external/Newtonsoft.Json': checked out '471c3e0803a9f40a0acc8aeceb31de6ff93a52c4'
Submodule path 'external/api-doc-tools': checked out '5da8127af9e68c9d58a90aa9de21f57491d81261'
Submodule path 'external/api-snapshot': checked out '7753b257899d0d0e51f02b60847a5b632f11bbc3'
Submodule path 'external/aspnetwebstack': checked out 'e77b12e6cc5ed260a98447f609e887337e44e299'

Then it ultimately fails with:

/usr/lib/gcc/i686-w64-mingw32/9.2.1/../../../../i686-w64-mingw32/bin/as: error while loading shared libraries: libc.so.6: cannot stat shared
 object: Error 103
/usr/libexec/gcc/i686-w64-mingw32/9.2.1/cc1: error while loading shared libraries: libc.so.6: cannot stat shared object: Error 103
i686-w64-mingw32-gcc: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: Error 107
mv: error while loading shared libraries: libc.so.6: cannot stat shared object: Error 103
i686-w64-mingw32-gcc: error while loading shared libraries: libc.so.6: cannot stat shared object: Error 103
i686-w64-mingw32-gcc: error while loading shared libraries: libc.so.6: cannot stat shared object: Error 107
cat: error while loading shared libraries: libc.so.6: cannot stat shared object: Error 107
i686-w64-mingw32-gcc: error while loading shared libraries: libc.so.6: cannot stat shared object: Error 107
mv: error while loading shared libraries: libc.so.6: cannot stat shared object: Error 107
container exited on bus error
Error: error building at STEP "RUN if [ -z "${mono_version}" ]; then echo -e "\n\nargument mono-version is mandatory!\n\n"; exit 1; fi &&     dnf -y install --setopt=install_weak_deps=False       mingw32-gcc mingw32-gcc-c++ mingw32-winpthreads-static mingw64-gcc mingw64-gcc-c++ mingw64-winpthreads-static yasm wine &&     dnf clean all &&     git clone https://github.com/mono/mono --branch ${mono_version} --single-branch &&     cd /root/mono &&     if [ ! -z "${mono_commit}" ]; then git checkout ${mono_commit}; fi &&     git submodule update --init &&     git apply -3 /root/files/patches/mono-unity-Clear-TLS-instead-of-aborting.patch &&     git apply -3 /root/files/patches/wine-mono.patch &&     export WINE_BITS=64 &&     bash /root/files/mono-build-win32.sh --prefix=/root/dependencies/mono-64 --host=x86_64-w64-mingw32 &&     git clean -fdx &&     cp /root/dependencies/mono-64/bin/libMonoPosixHelper.dll /root/dependencies/mono-64/bin/MonoPosixHelper.dll &&     rm -f /root/dependencies/mono-64/bin/mono /root/dependencies/mono-64/bin/mono-sgen &&     ln -s /usr/bin/mono /root/dependencies/mono-64/bin/mono &&     ln -s /usr/bin/mono-sgen /root/dependencies/mono-64/bin/mono-sgen &&     cp -rvp /etc/mono /root/dependencies/mono-64/etc &&     export WINE_BITS=32 &&     bash /root/files/mono-build-win32.sh --prefix=/root/dependencies/mono-32 --host=i686-w64-mingw32 &&     cd /root &&     cp /root/dependencies/mono-32/bin/libMonoPosixHelper.dll /root/dependencies/mono-32/bin/MonoPosixHelper.dll &&     rm -f /root/dependencies/mono-32/bin/mono /root/dependencies/mono-32/bin/mono-sgen &&     ln -s /usr/bin/mono /root/dependencies/mono-32/bin/mono &&     ln -s /usr/bin/mono-sgen /root/dependencies/mono-32/bin/mono-sgen &&     cp -rvp /etc/mono /root/dependencies/mono-32/etc &&     rm -rf /root/mono &&     dnf -y remove wine": error while running runtime: exit status 1

windows.log

git submodule update --init && \ could have --progress added to show that it's actively doing something to fix the apparent hanging issue.

edit: Corefx and corert are both slow. hmmm, now it's hung on the update.

Cloning into '/root/mono/external/corert'...
remote: Enumerating objects: 84587, done.        
Receiving objects:  39% (33615/84587), 18.04 MiB | 71.00 KiB/s

You are not transferring at 71kb/s! It's stayed on this screen for over two hours now.


Also maybe adding --depth 1 will be a good thing to all git clones and submodules as we don't need full history.

Why download mono 7 times? In android, javascript, mono, osx, ubuntu 32/64, and windows? It wastes hours building and especially when testing the docker scripts. Why not just leave it in the godot-mono docker and inherit it into the others, or download it to the /files/ mount?


This is the line that causes the build failure. I'm testing more, but everytime it fails it quits the docker image to close, which erases all of mono and the 64-bit build again.

    bash /root/files/mono-build-win32.sh --prefix=/root/dependencies/mono-32 --host=i686-w64-mingw32 && \

I'm making a PR to move downloading mono and submodules to build.sh, but only if it hasn't already been downloaded. Then the dockers copy it in if needed. All git clone and submodules now use --depth 1.


In Dockerfile.Windows this fails for mono-6.6.0.166:

$ git apply -3 /root/files/patches/wine-mono.patch
error: mcs/build/platforms/win32.make: does not match index
error: mcs/build/profiles/build.make: does not match index
error: mcs/build/rules.make: does not match index

The files, line numbers, and patterns seem to match perfectly in all three files, so I don't know what its problem is. However, this works fine:

patch -p1 < ../files/patches/wine-mono.patch 

Re: the crashing line in Dockerfile.windows:
bash /root/files/mono-build-win32.sh --prefix=/root/dependencies/mono-32 --host=i686-w64-mingw32
This section of the shell script doesn't run:

pushd mcs/jay
make CC=gcc
---
root@bf1ddafd35b7 jay]# make CC=gcc 
../build/rules.make:98: ../build/platforms/.make: No such file or directory
make: *** No rule to make target '../build/platforms/.make'.  Stop.

Is Jay needed? Is it a problem with mono-6.6.0.166?


Well it seems to now build both versions of mono without adding any new libraries, but cleaning up dnf hangs on this scriptlet. :(

Running scriptlet: avahi-0.7-20.fc31.x86_64                           209/264

Fixed windows. Massively reduced redundant downloads. Now osx won't build.

Given SDK does not contain libc++ headers (-stdlib=libc++ test may fail)
You may want to re-package your SDK using 'tools/gen_sdk_package.sh' on OS X

testing o64-clang++ -stdlib=libc++ -std=c++11 ... failed (ignored)

testing o64-clang ... osxcross: error: cannot find libc++ headers
osxcross: error: while detecting target

exiting with abnormal exit code (1)
run 'OCDEBUG=1 ./build.sh' to enable debug messages

My MacOS SDK was built with the old script that did not include the headers before the xcode packer was fixed. Xcode and osx built properly.

So this is puzzling as I can't reproduce those issues, I just did a build with ./build.sh 3.2 mono-6.6.0.166 and the Windows container built perfectly fine:

localhost/godot-windows                               3.2-mono-6.6.0.166      821ab810a074   35 seconds ago   3.93 GB

The only issue which I did reproduce in the past is the Mono clones somehow stalling, forcing me to restart the build for another chance at having them complete properly.

Hmm, did you also build base and mono?
Did Jay build for you? (it errored before, but the script doesn't stop)
Did git apply apply the patches?

Hmm, did you also build base and mono?

Yes, I used ./build.sh 3.2 mono-6.6.0.166 with no custom edits, so it starts with the base and mono/mono-glue.

Did Jay build for you? (it errored before, but the script doesn't stop)

No error in my windows.log.

Did git apply apply the patches?

Works for me, but indeed if you clone with --depth 1, I can imagine that git apply would fail, since it has no git history to base its operation on.

I didn't change the scripts until after the failures. :/
I'm rebuilding with mono-6.6.0.161 and we'll see what happens this time. Maybe I will have more information and experience to track down the problem this time.

Using podman, I rebuilt all dockers up through windows with mono .161:

  1. Jay failed to build with the exact same issue above. mono-build-win32.sh does not have set -e so it doesn't stop and you might not notice.
  2. 64-bit build completes but /root/dependencies/mono-64/lib/mono/ is empty save llvm.
  3. 32-bit build breaks with the libc.so.6 errors, then the running docker exits entirely, as before. Copying a completely new mono folder, not just git clean -fdx resolved this. Maybe it has to do with the submodules. /root/dependencies/mono-32/lib/mono/ empty.
  4. dnf remove wine hangs on avahi.

windows.log

Can you attach your windows.log?

Here's my windows.log (from a build of mono-6.6.0.166, but mono-6.6.0.161 worked fine too in my last build):
windows.log

I'm using podman-1.6.2-2.fc31.x86_64 on Fedora 31.

We don't have the same steps (STEP 3: ARG mono_commit) so it seems you're building from your PR / with local changes and not current master.

I've made a new branch and have been making the changes to mono already. It took 12 hours to download mono without it due to timeouts. The results are the exact same as when I used master originally.

This is what I was looking for. Autoconf results: .NET 4.x: yes Mine is no. Why?!

Your autoconf results:

        mcs source:    mcs
	C# Compiler:   roslyn
	CompilerServer:yes

   Engine:
	Host:	       x86_64-w64-mingw32
	Target:	       x86_64-w64-mingw32
	GC:	       sgen (concurrent by default)
	Suspend:       Hybrid
	TLS:           pthread
	SIGALTSTACK:   no
	Engine:        Building and using the JIT
	BigArrays:     no
	DTrace:        no
	LLVM Back End: no (dynamically loaded: no, built in-tree: no, assertions: no, msvc only: no)
	Spectre:       no mitigation
	Mono.Native:   no

   Libraries:
	.NET 4.x:        yes
	Xamarin.Android: no
	Xamarin.iOS:     no
	Xamarin.WatchOS: no
	Xamarin.TVOS:    no
	Xamarin.Mac:     no
	Windows AOT:     no
	Orbis:           no
	Unreal:          no
	WebAssembly:     no
	Test profiles:   AOT Full (no), AOT Hybrid (no), AOT Full Interp (no), Windows Full AOT Interp (no)
	JNI support:     no
	libgdiplus:      assumed to be installed
	zlib:            bundled zlib
	BTLS:            no
	jemalloc:        no (always use: no)
	crash reporting: no (private crashes: yes)
	.NET Core:       no

My autoconf results:

        mcs source:    mcs
        C# Compiler:   roslyn
        CompilerServer:yes

   Engine:
        Host:          x86_64-w64-mingw32
        Target:        x86_64-w64-mingw32
        GC:            sgen (concurrent by default)
        Suspend:       Hybrid
        TLS:           pthread
        SIGALTSTACK:   no
        Engine:        Building and using the JIT
        BigArrays:     no
        DTrace:        no
        LLVM Back End: no (dynamically loaded: no, built in-tree: no, assertions: no, msvc only: no)
        Spectre:       no mitigation
        Mono.Native:   no

   Libraries:
        .NET 4.x:        no
        Xamarin.Android: no
        Xamarin.iOS:     no
        Xamarin.WatchOS: no
        Xamarin.TVOS:    no
        Xamarin.Mac:     no
        Windows AOT:     no
        Orbis:           no
        Unreal:          no
        WebAssembly:     no
        Test profiles:   AOT Full (no), AOT Hybrid (no), AOT Full Interp (no), Windows Full AOT Interp (no)
        JNI support:     no
        libgdiplus:      assumed to be installed
        zlib:            bundled zlib
        BTLS:            no
        jemalloc:        no (always use: no)
        crash reporting: no (private crashes: yes)
        .NET Core:       no

Your build has:

checking whether we are cross compiling... yes

while mine has no. No idea why.

Another diff a few lines below:

tr: warning: an unescaped backslash at end of string is not portable

not in my log.

In my original windows.log in the first post, that and all previous dockers were built with master, mono-6.6.0.166, and podman 1.6.2.

Your autoconf finds .NET 4.x
Your jay compiles fine.
Your wine does not give you a ton of errors.

Running a diff on your log and my original log, the first real differences are:

  • The autoconf cross compiling thing.
  • +tr: warning: an unescaped backslash at end of string is not portable (likely due to a failed git apply)
  • .NET 4.x: yes/no
  • Jay failing (platform is null)
  • lots of wine errors:
0012:err:ole:StdMarshalImpl_MarshalInterface Failed to create ifstub, hres=0x80004002
0012:err:ole:CoMarshalInterface Failed to marshal the interface {6d5140c1-7436-11ce-8034-00aa006009fa}, 80004002
0012:err:ole:get_local_server_stream Failed: 80004002
0014:err:winediag:nodrv_CreateWindow Application tried to create a window, but no driver could be loaded.
0014:err:winediag:nodrv_CreateWindow Make sure that your X server is running and that $DISPLAY is set correctly.
0014:err:ole:apartment_createwindowifneeded CreateWindow failed with error 183
0014:err:ole:apartment_createwindowifneeded CreateWindow failed with error 0
0014:err:ole:apartment_createwindowifneeded CreateWindow failed with error 14007
0014:err:ole:StdMarshalImpl_MarshalInterface Failed to create ifstub, hres=0x800736b7
0014:err:ole:CoMarshalInterface Failed to marshal the interface {6d5140c1-7436-11ce-8034-00aa006009fa}, 800736b7
0014:err:ole:get_local_server_stream Failed: 800736b7
000b:err:winediag:nodrv_CreateWindow Application tried to create a window, but no driver could be loaded.
000b:err:winediag:nodrv_CreateWindow Make sure that your X server is running and that $DISPLAY is set correctly.
0010:err:winediag:nodrv_CreateWindow Application tried to create a window, but no driver could be loaded.
0010:err:winediag:nodrv_CreateWindow Make sure that your X server is running and that $DISPLAY is set correctly.
0010:fixme:ntdll:NtLockFile I/O completion on lock not implemented yet
0010:fixme:ntdll:NtQuerySystemInformation info_class SYSTEM_PERFORMANCE_INFORMATION
0010:err:mscoree:LoadLibraryShim error reading registry key for installroot
...

What Fedora 31 image do you have? Mine is:

docker.io/library/fedora                              31                      f0858ad3febd   4 months ago    201 MB

Maybe yours is older?

$ podman image inspect docker.io/library/fedora:31 
[
    {
        "Id": "f0858ad3febdf45bb2e5501cb459affffacef081f79eaa436085c3b6d9bd46ca",
        "Digest": "sha256:8fa60b88e2a7eac8460b9c0104b877f1aa0cea7fbc03c701b7e545dacccfb433",
        "RepoTags": [
            "docker.io/library/fedora:31"
        ],
        "RepoDigests": [
            "docker.io/library/fedora@sha256:8fa60b88e2a7eac8460b9c0104b877f1aa0cea7fbc03c701b7e545dacccfb433"
        ],
        "Parent": "",
        "Comment": "",
        "Created": "2019-10-29T03:23:37.695123423Z",
...

I erased all dockers recently to start again.
docker.io/library/fedora 31 536f3995adeb 3 weeks ago 200 MB

It seems exploring autoconf to figure out where it's looking for .NET and check how it's testing cross compiling might give a clue. I'll take a break then dive into the autoconf tests and command line parameters.

Though it shouldn't matter, my host system is Ubuntu 19.10.

Some comments from @hpvb on IRC:

10:43 <TMM> He gets errors reading libc at the start
10:43 <TMM> Smells like overlayfs failing
10:44 <TMM> On his host
10:44 <Akien> Ah good so we can blame Ubuntu :P
10:44 <TMM> There's no way fedora:31 is that broken
10:45 <Akien> overlayfs failing could also explain why his mono git clone took half a day...
10:45 <TMM> I'll try to reproduce with an Ubuntu vm
10:45 <TMM> Yeah, he's probably using rootless
10:45 <TMM> Which is real slow
10:45 <TMM> So overlayfs-fuse is probably borken on Ubuntu

Side note: If that's the problem, beyond document it, we should add some tests to check if overlayfs is working, and abort if it doesn't. That's a lot of wasted hours debugging containers for a likely host issue :/

You can try to run the build script as root, which should make overlayfs work.

I've been using "fuse-overlayfs".

I made a new user and with an empty storage folder built base, mono and windows w/ .161. This used my system default of "overlay" (which should be different from fuse-overlayfs). When mono/linux builds, I can see it detecting .Net 4.x and later building mscorlib.dll, but when it gets to windows it doesn't detect it. No change here. Stops at building Jay (because I set -e).


Next, I switched to root and again in a new, empty container folder built base, mono, windows using the default "overlay".

This time it went through the mono docker, installed the last 4 RPMs, then reported:

time="2020-03-19T02:14:39+08:00" level=error msg="Can't add file /var/lib/containers/storage/overlay/39329427931ad3da38d724e0e6335bb84ad663d5cbd2727f21ab2db483075215/diff/tmp/monomake to tar: archive/tar: sockets not supported"

Libpod says "The error occurs because archive/tar fails when a file has ModeSocket." containers/podman#4775 (comment)
Anyway the docker completed and the last rpm was installed successfully.

When it builds windows, it also doesn't detect .NET 4.x and Jay fails.


Finally I switched to using "vfs" as root. I erased my container folders and built base, mono, windows. VFS seems faster. But, alas, when it got to windows it had the exact same result. No .net 4.x and failure on jay.
checking whether we are cross compiling... yes

So maybe we can rule out the filesystem.

BTW, git clone taking so long is because of crappy internet and drops. Once it drops git takes forever to timeout and restart. Git clone + 25 sub modules downloading one at a time gives a lot of opportunity for long delays. In my new patch I've told git to download up to 6 submodules at once and it is much faster.

tr: warning: an unescaped backslash at end of string is not portable

not in my log.

Though I thought this was from a failed wine-patch, this appears whether the wine-patch is successfully applied or not. So that's still a mystery.

I did some reading on cross compiling. That is an autoconf test. They consider the build, target, and host parameters. Except ours are the same.

I've looked through configure.ac to figure out how it's testing for the .NET library, but it's so confusing, I don't get it.

What about uploading binary pre-setup dockers? The ones on prehensile-tales.com are from fedora-29.

FastNoiseSIMD doesn't build with mingw, and I need to use msvc to compile it for my unofficial binaries anyway, so I'm thinking of abandoning the windows docker altogether. I'd like to see if there's a way I could use the MSVC compiler in linux through wine or vfio. It's unfortunate that dockers are not as platform-independent as they're supposed to be.

I guess I'll close this since you can't reproduce it and I'm over it. If you come up with something else like autoconf parameters to locate the .NET library or something, I'm willing to try.

Otherwise I just got virtualbox to load my installed windows partition so now I can run MSVC and scons in linux without having to have two windows installs! So I'm good to go on the windows/mono compilation and the other dockers work just fine.