GaloisInc / pate

Patches Assured up to Trace Equivalence

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GHC internal error reported in pate container

thebendavis opened this issue · comments

I'm seeing what looks like PATE triggering a GHC 9.6.2 bug in our current docker image built by CI, running on my M1 mac.

I pulled the current docker image built from CI on the current master and tried running it on the scada.exe example:

$ docker run --platform linux/amd64 --rm -it -v "$(pwd)":/z artifactory.galois.com:5025/pate/pate:refs-heads-master -o /z/scada.exe -p /z/scada.patched.exe -s parse_packet 

Up to date
.
Choose Entry Point
0: Function Entry "_start" (segment1+0x435)
1: Function Entry "parse_packet" (segment1+0x554)
?>1
.........
0: Function Entry "parse_packet" (segment1+0x554) (User Request).......
1: segment1+0x580 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)...assertion failed [node != nullptr]: There is a hole in the VmTracker tree at address 0xfffffffffffffffa
(VMAllocationTracker.cpp:137 vm_allocation_info_for_address)
 /home/src/pate.sh: line 26:    35 Trace/breakpoint trap   cabal v2-exec ghci -- -v0 -fobject-code -fno-warn-type-defaults -fno-warn-missing-home-modules -threaded -rtsopts "-with-rtsopts=-N -A16M -c" -i"${SCRIPT_DIR}/tools/pate-repl/" -ghci-script ${temp_ghci_cd} -ghci-script "${SCRIPT_DIR}/loadrepl.ghci" -ghci-script ${temp_ghci}

If I re-run the above, sometimes I get other errors, e.g.

ghc-9.6.2: internal error: allocGroup: free list corrupted
    (GHC version 9.6.2 for x86_64_unknown_linux)
    Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug

In #413 I tried building the docker container in CI with GHC 9.6.5 - my hope was that some GHC bugfixes in the newer releases would address the issue I described above. Unfortunately, I still encounter issues with the GHC 9.6.5-built PATE:

$ docker run --platform linux/amd64 --rm -it -v "$(pwd)":/z artifactory.galois.com:5025/pate/pate:refs-pull-413-merge -o /z/scada.exe -p /z/scada.patched.exe -s parse_packet

Up to date
.
Choose Entry Point
0: Function Entry "_start" (segment1+0x435)
1: Function Entry "parse_packet" (segment1+0x554)
?>1
........
0: Function Entry "parse_packet" (segment1+0x554) (User Request)......
1: segment1+0x580 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)......
2: segment1+0x5ac [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)
3: Return "parse_packet" (segment1+0x554) (Widening Equivalence Domains)......
4: segment1+0x5cc [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)......
5: segment1+0x5e0 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)......
6: segment1+0x600 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)
7: Return "parse_packet" (segment1+0x554) (Widening Equivalence Domains)......
8: segment1+0x624 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)..ghc-9.6.5: internal error: evacuate: strange closure type -795295279
    (GHC version 9.6.5 for x86_64_unknown_linux)
    Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug
/home/src/pate.sh: line 26:    35 Aborted                 cabal v2-exec ghci -- -v0 -fobject-code -fno-warn-type-defaults -fno-warn-missing-home-modules -threaded -rtsopts "-with-rtsopts=-N -A16M -c" -i"${SCRIPT_DIR}/tools/pate-repl/" -ghci-script ${temp_ghci_cd} -ghci-script "${SCRIPT_DIR}/loadrepl.ghci" -ghci-script ${temp_ghci}

If I try a few times I'll see slightly different errors, e.g.

ghc-9.6.5: internal error: scavenge: unimplemented/strange closure type 240533401 @ 0x428b4aa928
    (GHC version 9.6.5 for x86_64_unknown_linux)
    Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug

but I've also seen it segfault with a message like:

$ docker run --platform linux/amd64 --rm -it -v "$(pwd)":/z artifactory.galois.com:5025/pate/pate:refs-pull-413-merge -o /z/scada.exe -p /z/scada.patched.exe -s parse_packet

Up to date
.
Choose Entry Point
0: Function Entry "_start" (segment1+0x435)
1: Function Entry "parse_packet" (segment1+0x554)
?>1
......../home/src/pate.sh: line 26:    35 Segmentation fault      cabal v2-exec ghci -- -v0 -fobject-code -fno-warn-type-defaults -fno-warn-missing-home-modules -threaded -rtsopts "-with-rtsopts=-N -A16M -c" -i"${SCRIPT_DIR}/tools/pate-repl/" -ghci-script ${temp_ghci_cd} -ghci-script "${SCRIPT_DIR}/loadrepl.ghci" -ghci-script ${temp_ghci}

I tried building and running pate locally on my host (not using the docker container) and ./pate.sh processed this target just fine for me. So I'm wondering if something weird is going on specifically with our docker container.

On a separate linux VM running on x86_64 hardware, I have

  1. built my own docker container and observed pate working on this target
  2. pulled the same master CI-built container and observed pate working on this target

So I am now suspicious the issue may have something to do with running an amd64 container on my M1 (arm64) mac system. On my M1 mac, I tried disabling the "Use Rosetta for x86_64/amd64 emulation on Apple Silicon" option and rerunning, but I observed that pate runs much slower (expected) and is eventually Killed before it can process much of the target.

Up to date
.
Choose Entry Point
0: Function Entry "_start" (segment1+0x435)
1: Function Entry "parse_packet" (segment1+0x554)
?>1
..........................................................................................................................................................................................................................................
0: Function Entry "parse_packet" (segment1+0x554) (User Request)..........................................................................................................................................................................................................................................
1: segment1+0x580 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains).................................................................................................................................................
/home/src/pate.sh: line 26:    59 Killed                  cabal v2-exec ghci -- -v0 -fobject-code -fno-warn-type-defaults -fno-warn-missing-home-modules -threaded -rtsopts "-with-rtsopts=-N -A16M -c" -i"${SCRIPT_DIR}/tools/pate-repl/" -ghci-script ${temp_ghci_cd} -ghci-script "${SCRIPT_DIR}/loadrepl.ghci" -ghci-script ${temp_ghci}

Another test: on my M1 mac, I tried disabling "Use Virtualization framework" and all related options ("VirtioFS" and "Use Rosetta") in Docker Desktop for Mac and tried re-running the CI-built master docker container from above.

It is eventually also killed, but makes it much further first:

$ docker run --platform linux/amd64 --rm -it -v "$(pwd)":/z artifactory.galois.com:5025/pate/pate:refs-heads-master -o /z/scada.exe -p /z/scada.patched.exe -s parse_packet

Up to date
..
Choose Entry Point
0: Function Entry "_start" (segment1+0x435)
1: Function Entry "parse_packet" (segment1+0x554)

?>
?>
?>wait

0: Function Entry "_start" (segment1+0x435)
1: Function Entry "parse_packet" (segment1+0x554)
?>1
..........................................................................................................................................................................................................................................
0: Function Entry "parse_packet" (segment1+0x554) (User Request)...........................................................................................................................................................................................................................................
1: segment1+0x580 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)........................................................................................................................................................................
2: segment1+0x5ac [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains).....................
3: Return "parse_packet" (segment1+0x554) (Widening Equivalence Domains)..........................................................................................................................................................
4: segment1+0x5cc [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)............................................................................................................................
5: segment1+0x5e0 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)..................................................................................................................................
6: segment1+0x600 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)................
7: Return "parse_packet" (segment1+0x554) (Widening Equivalence Domains)..............................................................................................................................
8: segment1+0x624 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)..........................................................................................
Handle observable difference:
0: Emit warning and continue 
1: Assert difference is infeasible (defer proof) 
2: Assert difference is infeasible (prove immediately) 
3: Assume difference is infeasible 
4: Avoid difference with equivalence condition 
?>4

0: Function Entry "parse_packet" (segment1+0x554) (User Request)
1: segment1+0x580 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)
2: segment1+0x5ac [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)
3: Return "parse_packet" (segment1+0x554) (Widening Equivalence Domains)
4: segment1+0x5cc [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)
5: segment1+0x5e0 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)
6: segment1+0x600 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)
7: Return "parse_packet" (segment1+0x554) (Widening Equivalence Domains)
8: segment1+0x624 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains)..
9: segment1+0x644 [ via: "parse_packet" (segment1+0x554) ] (Widening Equivalence Domains).......................
/home/src/pate.sh: line 26:    59 Killed                  cabal v2-exec ghci -- -v0 -fobject-code -fno-warn-type-defaults -fno-warn-missing-home-modules -threaded -rtsopts "-with-rtsopts=-N -A16M -c" -i"${SCRIPT_DIR}/tools/pate-repl/" -ghci-script ${temp_ghci_cd} -ghci-script "${SCRIPT_DIR}/loadrepl.ghci" -ghci-script ${temp_ghci}

Ok, at this point it seems this is a Docker Desktop + arm64 mac virtualization issue rather than anything wrong with our container(s).

There are other recent segfault issues posted in the docker/for-mac repo issue tracker. A response posted today suggests trying an internal build of Docker Desktop for mac, a pre-release of Docker Desktop 4.32.0. When I try this version, our container works again. So this seems to be an issue with the current stable release of Docker Desktop, which should be fixed in an upcoming release.