Panic when profiling AVX2 implementation
egonelbre opened this issue · comments
It seems there's a fairly repeatable panic with this code.
s:\deps\sha256-simd>go test -bench BenchmarkHash/AVX2/8Bytes -benchtime 1s -cpuprofile cpu.prof
goos: windows
goarch: amd64
pkg: github.com/minio/sha256-simd
BenchmarkHash/AVX2/8Bytes-32 fatal error: unexpected signal during runtime execution
[signal 0xc0000005 code=0x0 addr=0xc00031c008 pc=0xc6e92d]
runtime stack:
runtime.throw(0xd8d2bd, 0x2a)
c:/go/src/runtime/panic.go:1116 +0x79
runtime.sigpanic()
c:/go/src/runtime/signal_windows.go:240 +0x285
runtime.gentraceback(0xd3efed, 0xc00031bbc0, 0x0, 0xc000382d80, 0x0, 0x7128dfee90, 0x40, 0x0, 0x0, 0x6, ...)
c:/go/src/runtime/traceback.go:251 +0x136d
runtime.sigprof(0xd3efed, 0xc00031bbc0, 0x0, 0xc000382d80, 0xc000088400)
c:/go/src/runtime/proc.go:4041 +0x491
runtime.profilem(0xc000088400, 0x1cc)
c:/go/src/runtime/os_windows.go:1105 +0xe9
runtime.profileloop1(0x0, 0x0)
c:/go/src/runtime/os_windows.go:1152 +0x1a5
runtime.externalthreadhandler(0x7128dff7b8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
c:/go/src/runtime/sys_windows_amd64.s:267 +0x95
goroutine 1 [chan receive]:
testing.(*B).run1(0xc00030a240, 0xc00030a240)
c:/go/src/testing/benchmark.go:233 +0xa6
testing.(*B).Run(0xc00030a000, 0xd8569b, 0xd, 0xd8fc40, 0xce9d00)
c:/go/src/testing/benchmark.go:654 +0x36c
testing.runBenchmarks.func1(0xc00030a000)
c:/go/src/testing/benchmark.go:534 +0x7f
testing.(*B).runN(0xc00030a000, 0x1)
c:/go/src/testing/benchmark.go:191 +0xf2
testing.runBenchmarks(0xd89f45, 0x1c, 0xc0000046a0, 0xe77300, 0x8, 0x8, 0xe7bac0)
c:/go/src/testing/benchmark.go:540 +0x3b5
testing.(*M).Run(0xc000130080, 0x0)
c:/go/src/testing/testing.go:1363 +0x57d
main.main()
_testmain.go:83 +0x145
goroutine 6 [sleep]:
time.Sleep(0x5f5e100)
c:/go/src/runtime/time.go:188 +0xc9
runtime/pprof.profileWriter(0xdb3da0, 0xc000006030)
c:/go/src/runtime/pprof/pprof.go:799 +0x72
created by runtime/pprof.StartCPUProfile
c:/go/src/runtime/pprof/pprof.go:784 +0x127
goroutine 31 [chan receive]:
testing.(*B).doBench(0xc00030a480, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
c:/go/src/testing/benchmark.go:277 +0x7a
testing.(*benchContext).processBench(0xc0000046e0, 0xc00030a480)
c:/go/src/testing/benchmark.go:570 +0x233
testing.(*B).run(0xc00030a480)
c:/go/src/testing/benchmark.go:268 +0x6a
testing.(*B).Run(0xc00030a240, 0xc00000c3e0, 0xb, 0xc000078710, 0x0)
c:/go/src/testing/benchmark.go:655 +0x42c
github.com/minio/sha256-simd.BenchmarkHash(0xc00030a240)
s:/deps/sha256-simd/sha256_test.go:2330 +0x46d
testing.(*B).runN(0xc00030a240, 0x1)
c:/go/src/testing/benchmark.go:191 +0xf2
testing.(*B).run1.func1(0xc00030a240)
c:/go/src/testing/benchmark.go:231 +0x5e
created by testing.(*B).run1
c:/go/src/testing/benchmark.go:224 +0x85
goroutine 82 [running]:
goroutine running on other thread; stack unavailable
created by testing.(*B).doBench
c:/go/src/testing/benchmark.go:276 +0x5c
exit status 2
FAIL github.com/minio/sha256-simd 0.706s
This doesn't seem to happen with other implementation.
Also, you may need to retry it a few times before it crashes.
Currently we are unable to reproduce the issue in linux.
Note, this could also be a go runtime issue since there are some similar issues, such as:
This seems to also happen with Go tip.
If I increase blockAvx2 stack size to TEXT ·blockAvx2(SB),$2048-48
then it doesn't seem to fail any more.
I think this is because the TEXT ·blockAvx2(SB)
code (at least) directly modifies and uses the SP and BP registers that the Go runtime inspects to generate a traceback. Since they aren't pointing at the top of the stack, it can end up walking into some strange places when asked to generate a traceback like during a profiling signal.
For example, if you insert a BYTE $0xCC
instruction somewhere in the inner loop, you can observe the runtime having difficulty generating a traceback:
Large traceback
SIGTRAP: trace trap
PC=0x51d945 m=0 sigcode=128
goroutine 18 [running]:
runtime: unexpected return pc for github.com/minio/sha256-simd.blockAvx2 called from 0x2073612073692043
stack: frame={sp:0xc0000957c0, fp:0xc000095c10} stack=[0xc000094000,0xc000096000)
000000c0000956c0: 000000c0000956e8 0000000000468a53 <reflect.resolveNameOff+51>
000000c0000956d0: 0000000000538c60 00000000000034b5
000000c0000956e0: 00000000005294b5 000000c000095720
000000c0000956f0: 000000000049e456 <reflect.(*rtype).String+54> 0000000000538c60
000000c000095700: 00000000000034b5 00000000005294b5
000000c000095710: 00000000005294b8 000000000000000a
000000c000095720: 000000c0000958f8 00000000004bb5f1 <fmt.(*pp).printValue+881>
000000c000095730: 000000c00010ac30 000000c000146fc0
000000c000095740: 0000000000000020 0000000000000020
000000c000095750: 0000000000000078 00000000005294b9
000000c000095760: 0000000000000009 00007f833d41a500
000000c000095770: 00007f833d60ffff 000000c0000957c0
000000c000095780: 0000000000417d5e <runtime.(*mcentral).grow+318> 00007f833d41a400
000000c000095790: 0020300000000000 00007f833d60ffff
000000c0000957a0: 00007f833d2b25c0 0000000000591b60
000000c0000957b0: 0000000000000001 00007f833d2b25c0
000000c0000957c0: <000000c000095838 000000c000095810
000000c0000957d0: 0000000000414b0f <runtime.heapBits.forwardOrBoundary+111> 00007f833d41a600
000000c0000957e0: 000000000000001f 00007f833d60ffff
000000c0000957f0: 0000000000000400 00007f833d41a700
000000c000095800: 000000c000095950 000000c0000959d8
000000c000095810: 000000c000095a18 000000c0000954d0
000000c000095820: 000000c000095970 0000000000513fef <github.com/minio/sha256-simd.blockGeneric+79>
000000c000095830: 0000000000538c60 0000000000000400
000000c000095840: 9288f7b95d69345c 87723109958b90cb
000000c000095850: 725d100cdc93e826 70d4a2f9c64895e2
000000c000095860: dfda16b1e41c0f8c 0000000000000040
000000c000095870: 0000000080000000 0000000000000000
000000c000095880: 0000000000000000 0000000000000000
000000c000095890: 0000000000000000 0000000000000000
000000c0000958a0: 0000000000000000 0000040000000000
000000c0000958b0: 0280000180000000 0000011000205000
000000c0000958c0: 00aa000022000800 c0002ac005089942
000000c0000958d0: 1028c80a62080004 9f004823001a4055
000000c0000958e0: 323b15b468ca269e 5b6835a31886f73d
000000c0000958f0: 3311a7d237fd1798 55edccc1e8977a87
000000c000095900: 1c1a75cd26785e65 70d975ed1898add6
000000c000095910: 000000c000095970 0000000000516888 <github.com/minio/sha256-simd.blockAvx2Go+168>
000000c000095920: 000000c000095950 0000000000000008
000000c000095930: 0000000000000008 000000c0000959d8
000000c000095940: 0000000000000040 0000000000000040
000000c000095950: bb67ae856a09e667 a54ff53a3c6ef372
000000c000095960: 9b05688c510e527f 5be0cd191f83d9ab
000000c000095970: 000000c0000959a0 0000000000513f3f <github.com/minio/sha256-simd.block+287>
000000c000095980: 000000c000095a60 000000c0000959d8
000000c000095990: 0000000000000040 0000000000000040
000000c0000959a0: 000000c000095a18 0000000000513d09 <github.com/minio/sha256-simd.(*digest).checkSum+201>
000000c0000959b0: 000000c000095a60 000000c0000959d8
000000c0000959c0: 0000000000000040 0000000000000040
000000c0000959d0: 0000000000000000 0000000000000080
000000c0000959e0: 0000000000000000 0000000000000000
000000c0000959f0: 0000000000000000 0000000000000000
000000c000095a00: 0000000000000000 0000000000000000
000000c000095a10: 0000000000000000 000000c000095ad0
000000c000095a20: 0000000000513806 <github.com/minio/sha256-simd.Sum256+230> 000000c000095a60
000000c000095a30: 0000000000000000 0000000000000000
000000c000095a40: 0000000000000000 0000000000000000
000000c000095a50: 0000000000000000 0000000000000000
000000c000095a60: bb67ae856a09e667 a54ff53a3c6ef372
000000c000095a70: 9b05688c510e527f 5be0cd191f83d9ab
000000c000095a80: 0000000000000000 0000000000000000
000000c000095a90: 0000000000000000 0000000000000000
000000c000095aa0: 0000000000000000 0000000000000000
000000c000095ab0: 0000000000000000 0000000000000000
000000c000095ac0: 0000000000000000 0000000000000000
000000c000095ad0: 000000c000095f70 0000000000517037 <github.com/minio/sha256-simd.TestGolden+1591>
000000c000095ae0: 000000c000095ba8 0000000000000000
000000c000095af0: 0000000000000020 0000000000000000
000000c000095b00: 0000000000000000 0000000000000000
000000c000095b10: 0000000000000000 0100000000000000
000000c000095b20: 0000000000000000 0000000000000000
000000c000095b30: 0000000000000000 0000000000000000
000000c000095b40: 0000000000000000 0000000000000000
000000c000095b50: 0000000000000000 0000000000000040
000000c000095b60: 0000000000000000 0000000000000000
000000c000095b70: 0000000000000000 000000000000003f
000000c000095b80: 0000000000000002 0000000000000000
000000c000095b90: 0000000000000000 0000000000000000
000000c000095ba0: 0000000000000000 0000000000000000
000000c000095bb0: 0000000000000000 0000000000000000
000000c000095bc0: 0000000000000000 0000000000000000
000000c000095bd0: 0000000000000000 0000000000000000
000000c000095be0: 0000000000000000 0000000000000000
000000c000095bf0: 0000000000000000 0000000000000000
000000c000095c00: 0000000000000000 !2073612073692043
000000c000095c10: >656c626174726f70 6e6f745320736120
000000c000095c20: 2121656764656865 2073612073692043
000000c000095c30: 656c626174726f70 6e6f745320736120
000000c000095c40: 2121656764656865 4bb7ff54e2f36ef1
000000c000095c50: 8cef8694d9973c7e d7fe6dbc804c9e54
000000c000095c60: cdfbf4767e5e8cfe 0000000000000000
000000c000095c70: 0000000000000000 0000000000000000
000000c000095c80: 0000000000000000 0000000000000000
000000c000095c90: 0000000000000000 0000000000000000
000000c000095ca0: 0000000000000000 0000000000000000
000000c000095cb0: 0000000000000000 0000000000000000
000000c000095cc0: 0000000000000000 0000000000000000
000000c000095cd0: 0000000000000000 0000000000000000
000000c000095ce0: 0000000000000000 0000000000000000
000000c000095cf0: 0000000000000000 0000000000000000
000000c000095d00: 0000000000000000 0000000000000000
github.com/minio/sha256-simd.blockAvx2(0x656c626174726f70, 0x6e6f745320736120, 0x2121656764656865, 0x2073612073692043, 0x656c626174726f70, 0x6e6f745320736120)
/home/jeff/tmp/sha256perf/sha256-simd/sha256blockAvx2_amd64.s:162 +0xc5 fp=0xc000095c10 sp=0xc0000957c0 pc=0x51d945
created by testing.(*T).Run
/nix/store/2d5n5y9w2h2dr6r3v5h0qdf6p4i3s7is-go-1.15.2/src/testing/testing.go:1178 +0x386
goroutine 1 [chan receive]:
testing.(*T).Run(0xc000120300, 0x5627f4, 0xa, 0x56c3b8, 0x47ffc6)
/nix/store/2d5n5y9w2h2dr6r3v5h0qdf6p4i3s7is-go-1.15.2/src/testing/testing.go:1179 +0x3ad
testing.runTests.func1(0xc000120180)
/nix/store/2d5n5y9w2h2dr6r3v5h0qdf6p4i3s7is-go-1.15.2/src/testing/testing.go:1449 +0x78
testing.tRunner(0xc000120180, 0xc000110de0)
/nix/store/2d5n5y9w2h2dr6r3v5h0qdf6p4i3s7is-go-1.15.2/src/testing/testing.go:1127 +0xef
testing.runTests(0xc000124060, 0x64e460, 0xe, 0xe, 0xbfd9dc57cbaacf9d, 0x8bb2cbe9e6, 0x652b20, 0x40d390)
/nix/store/2d5n5y9w2h2dr6r3v5h0qdf6p4i3s7is-go-1.15.2/src/testing/testing.go:1447 +0x2e8
testing.(*M).Run(0xc000142000, 0x0)
/nix/store/2d5n5y9w2h2dr6r3v5h0qdf6p4i3s7is-go-1.15.2/src/testing/testing.go:1357 +0x245
main.main()
_testmain.go:85 +0x138
rax 0x6a09e667
rbx 0xbb67ae85
rcx 0x3c6ef372
rdx 0xa54ff53a
rdi 0xc000095950
rsi 0xc000095a18
rbp 0x592060
rsp 0xc0000957c0
r8 0x510e527f
r9 0x9b05688c
r10 0x1f83d9ab
r11 0x5be0cd19
r12 0xc0000957c0
r13 0x76f4fbcd
r14 0x0
r15 0xc0000959d8
rip 0x51d945
rflags 0x246
cs 0x33
fs 0x0
gs 0x0
exit status 2
FAIL github.com/minio/sha256-simd 0.004s
The runtime has some protections to avoid problems caused by generating tracebacks during signals, but I don't think it can protect against this. I'm reaching the ends of my understanding, but I think the stack contents will be fairly random, and so if you're unlucky enough, you can fool the runtime into walking into some spot that is just right enough to trick it into crashing.
If I increase blockAvx2 stack size to
TEXT ·blockAvx2(SB),$2048-48
then it doesn't seem to fail any more.
Just curious, by how much did you increase the stack size to not see the issue?
And you are only seeing this in Windows ?
@fwessels changed:
from:
TEXT ·blockAvx2(SB),$1088-48
to:
TEXT ·blockAvx2(SB),$2048-48
Note, I'm not certain that it is the proper fix, it might just hide the problem better as a result.
Yes, currently only seeing on Windows. We've had a weird crash on Linux as well, however unable to reproduce them -- or confirm that they are caused by sha256-simd.
Keith Randall mentioned that the issue is caused by playing with SP, not BP. Because the traceback code can't handle variable sized frames. golang/go#43496 (comment)
@fwessels Do you have the original assembly? It doesn't seem to have been committed here. It would be nice to be able to fix this without rewriting it.
I will remove modifications of SP
, and just make the stack allocation static, maybe remove the stack alignment. I don't think it will matter on any CPU with AVX2 anyway.
#57 Will remove this code, since it is slower.
@klauspost I would have to dig real hard to come up with the original assembly code it was based on -- but this seems moot anyways regarding #57, so we will leave it at this