compiler-explorer / compiler-explorer

Run compilers interactively from your web browser and interact with the assembly

Home Page:https://godbolt.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG]: Go Control Flow Graph not correctly matched

stapelberg opened this issue · comments

Describe the bug

Hey folks! I recently saw Matt Godbolt’s talk about what’s new in compiler explorer and was delighted to learn about the Control Flow Graph.

Unfortunately, when looking at the Control Flow Graph of a Go function, it seems like each basic block is treated as its own function — I don’t know how this feature works in detail, but maybe these blocks are not correctly matched to the main function?

image

Steps to reproduce

  1. Visit godbolt.org
  2. Enter the following Go code:
package main

import "os"

func main() {
    if os.Getpid() > 0 {
        os.Chdir("/tmp")
    } else {
        os.Remove("/tmp/foo")
    }
}
  1. Use x86-64 gc (tip) (default option)
  2. Add a Control Flow Graph

Expected behavior

I would have expected more than 1 box and arrows between them.

Reproduction link

https://godbolt.org/z/Ed8xYo7M1

Screenshots

image

Operating System

No response

Browser version

Google Chrome 124

commented

Thanks for reporting this!
Compiler-explorer uses some heuristics to parse the assembly output, among others - to partition it into functions. Unfortunately these heuristics fail for the go compiler gc, and what's worse - I myself cannot distinguish basic-blocks from functions in its output format :( So not sure this is solvable.
(Ideas are very welcome - eg is anyone aware of gc flags that somehow control formatting of assembly output?)

gccgo does produce parsable assembly output (block labels start with a dot), and cfg generation indeed succeeds: https://godbolt.org/z/M9YaY1oT9
Perhaps this is a valid workaround?

commented

Looking at the full asm dump of the default go program https://godbolt.org/z/zx1EvK8zd :

        .file 1 "<source>"
        .loc 1 5 0
        TEXT    main.Square(SB), NOSPLIT|NOFRAME|ABIInternal, $0-8
        FUNCDATA        $0, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
        FUNCDATA        $1, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
        FUNCDATA        $5, main.Square.arginfo1(SB)
        FUNCDATA        $6, main.Square.argliveinfo(SB)
        PCDATA  $3, $1
        .loc 1 6 0
        IMULQ   AX, AX
        RET
        .loc 1 9 0
        TEXT    main.main(SB), NOSPLIT|NOFRAME|ABIInternal, $0-0
        FUNCDATA        $0, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
        FUNCDATA        $1, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
        RET

It seems there's little hope of parsing this dump programmatically - this one doesn't even contain labels for function name, let alone basic blocks.

Hey, sorry for the slow response and thanks for taking a look.

I was wondering how exactly godbolt runs the Go compiler to obtain the dump you showed without function names?

I tried using Go 1.22.1 by setting the GOTOOLCHAIN environment variable, meaning using the standard Go-managed build of Go 1.22.1.

The UI shows it’s using the following compiler flags:

2024-05-12-build

But when I compile the square code like that, I get:

GOTOOLCHAIN=go1.22.1 go build -o output.s -gcflags=-S default.go
# command-line-arguments                                              
main.Square STEXT nosplit size=5 args=0x8 locals=0x0 funcid=0x0 align=0x0
	0x0000 00000 (/tmp/ce/default.go:5)	TEXT	main.Square(SB), NOSPLIT|NOFRAME|ABIInternal, $0-8
	0x0000 00000 (/tmp/ce/default.go:5)	FUNCDATA	$0, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
	0x0000 00000 (/tmp/ce/default.go:5)	FUNCDATA	$1, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
	0x0000 00000 (/tmp/ce/default.go:5)	FUNCDATA	$5, main.Square.arginfo1(SB)
	0x0000 00000 (/tmp/ce/default.go:5)	FUNCDATA	$6, main.Square.argliveinfo(SB)
	0x0000 00000 (/tmp/ce/default.go:5)	PCDATA	$3, $1
	0x0000 00000 (/tmp/ce/default.go:6)	IMULQ	AX, AX
	0x0004 00004 (/tmp/ce/default.go:6)	RET
	0x0000 48 0f af c0 c3                                   H....
main.main STEXT nosplit size=1 args=0x0 locals=0x0 funcid=0x0 align=0x0
	0x0000 00000 (/tmp/ce/default.go:9)	TEXT	main.main(SB), NOSPLIT|NOFRAME|ABIInternal, $0-0
	0x0000 00000 (/tmp/ce/default.go:9)	FUNCDATA	$0, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
	0x0000 00000 (/tmp/ce/default.go:9)	FUNCDATA	$1, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
	0x0000 00000 (/tmp/ce/default.go:9)	RET
	0x0000 c3                                               .
go:cuinfo.producer.main SDWARFCUINFO dupok size=0
	0x0000 72 65 67 61 62 69                                regabi
go:cuinfo.packagename.main SDWARFCUINFO dupok size=0
	0x0000 6d 61 69 6e                                      main
main..inittask SNOPTRDATA size=8
	0x0000 00 00 00 00 00 00 00 00                          ........
gclocals·g2BeySu+wFnoycgXfElmcg== SRODATA dupok size=8
	0x0000 01 00 00 00 00 00 00 00                          ........
main.Square.arginfo1 SRODATA static dupok size=3
	0x0000 00 08 ff                                         ...
main.Square.argliveinfo SRODATA static dupok size=2
	0x0000 00 00                                            ..

So I’m wondering if the function names get lost somewhere else or if I’m reproducing it wrong.

Regarding the basic blocks, what would we need to change in Go to make that work? Just emit more labels in the assembly output?

Thanks

GO has a special parser implementation, it takes the functions and adds custom labels https://github.com/compiler-explorer/compiler-explorer/blob/main/lib/compilers/golang.ts#L152 - starting with _pc0

I think based on that information, we can make a custom implementation for CFG

commented

@partouf is right of course, I missed the custom parser.