[BUG]: Go Control Flow Graph not correctly matched
stapelberg opened this issue · comments
Describe the bug
Hey folks! I recently saw Matt Godbolt’s talk about what’s new in compiler explorer and was delighted to learn about the Control Flow Graph.
Unfortunately, when looking at the Control Flow Graph of a Go function, it seems like each basic block is treated as its own function — I don’t know how this feature works in detail, but maybe these blocks are not correctly matched to the main function?
Steps to reproduce
- Visit godbolt.org
- Enter the following Go code:
package main
import "os"
func main() {
if os.Getpid() > 0 {
os.Chdir("/tmp")
} else {
os.Remove("/tmp/foo")
}
}
- Use x86-64 gc (tip) (default option)
- Add a Control Flow Graph
Expected behavior
I would have expected more than 1 box and arrows between them.
Reproduction link
https://godbolt.org/z/Ed8xYo7M1
Screenshots
Operating System
No response
Browser version
Google Chrome 124
Thanks for reporting this!
Compiler-explorer uses some heuristics to parse the assembly output, among others - to partition it into functions. Unfortunately these heuristics fail for the go compiler gc
, and what's worse - I myself cannot distinguish basic-blocks from functions in its output format :( So not sure this is solvable.
(Ideas are very welcome - eg is anyone aware of gc
flags that somehow control formatting of assembly output?)
gccgo
does produce parsable assembly output (block labels start with a dot), and cfg generation indeed succeeds: https://godbolt.org/z/M9YaY1oT9
Perhaps this is a valid workaround?
Looking at the full asm dump of the default go program https://godbolt.org/z/zx1EvK8zd :
.file 1 "<source>"
.loc 1 5 0
TEXT main.Square(SB), NOSPLIT|NOFRAME|ABIInternal, $0-8
FUNCDATA $0, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
FUNCDATA $1, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
FUNCDATA $5, main.Square.arginfo1(SB)
FUNCDATA $6, main.Square.argliveinfo(SB)
PCDATA $3, $1
.loc 1 6 0
IMULQ AX, AX
RET
.loc 1 9 0
TEXT main.main(SB), NOSPLIT|NOFRAME|ABIInternal, $0-0
FUNCDATA $0, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
FUNCDATA $1, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
RET
It seems there's little hope of parsing this dump programmatically - this one doesn't even contain labels for function name, let alone basic blocks.
Hey, sorry for the slow response and thanks for taking a look.
I was wondering how exactly godbolt runs the Go compiler to obtain the dump you showed without function names?
I tried using Go 1.22.1 by setting the GOTOOLCHAIN
environment variable, meaning using the standard Go-managed build of Go 1.22.1.
The UI shows it’s using the following compiler flags:
But when I compile the square code like that, I get:
GOTOOLCHAIN=go1.22.1 go build -o output.s -gcflags=-S default.go
# command-line-arguments
main.Square STEXT nosplit size=5 args=0x8 locals=0x0 funcid=0x0 align=0x0
0x0000 00000 (/tmp/ce/default.go:5) TEXT main.Square(SB), NOSPLIT|NOFRAME|ABIInternal, $0-8
0x0000 00000 (/tmp/ce/default.go:5) FUNCDATA $0, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
0x0000 00000 (/tmp/ce/default.go:5) FUNCDATA $1, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
0x0000 00000 (/tmp/ce/default.go:5) FUNCDATA $5, main.Square.arginfo1(SB)
0x0000 00000 (/tmp/ce/default.go:5) FUNCDATA $6, main.Square.argliveinfo(SB)
0x0000 00000 (/tmp/ce/default.go:5) PCDATA $3, $1
0x0000 00000 (/tmp/ce/default.go:6) IMULQ AX, AX
0x0004 00004 (/tmp/ce/default.go:6) RET
0x0000 48 0f af c0 c3 H....
main.main STEXT nosplit size=1 args=0x0 locals=0x0 funcid=0x0 align=0x0
0x0000 00000 (/tmp/ce/default.go:9) TEXT main.main(SB), NOSPLIT|NOFRAME|ABIInternal, $0-0
0x0000 00000 (/tmp/ce/default.go:9) FUNCDATA $0, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
0x0000 00000 (/tmp/ce/default.go:9) FUNCDATA $1, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
0x0000 00000 (/tmp/ce/default.go:9) RET
0x0000 c3 .
go:cuinfo.producer.main SDWARFCUINFO dupok size=0
0x0000 72 65 67 61 62 69 regabi
go:cuinfo.packagename.main SDWARFCUINFO dupok size=0
0x0000 6d 61 69 6e main
main..inittask SNOPTRDATA size=8
0x0000 00 00 00 00 00 00 00 00 ........
gclocals·g2BeySu+wFnoycgXfElmcg== SRODATA dupok size=8
0x0000 01 00 00 00 00 00 00 00 ........
main.Square.arginfo1 SRODATA static dupok size=3
0x0000 00 08 ff ...
main.Square.argliveinfo SRODATA static dupok size=2
0x0000 00 00 ..
So I’m wondering if the function names get lost somewhere else or if I’m reproducing it wrong.
Regarding the basic blocks, what would we need to change in Go to make that work? Just emit more labels in the assembly output?
Thanks
GO has a special parser implementation, it takes the functions and adds custom labels https://github.com/compiler-explorer/compiler-explorer/blob/main/lib/compilers/golang.ts#L152 - starting with _pc0
I think based on that information, we can make a custom implementation for CFG