minio / sha256-simd

Accelerate SHA256 computations in pure Go using AVX512, SHA Extensions for x86 and ARM64 for ARM. On AVX512 it provides an up to 8x improvement (over 3 GB/s per core). SHA Extensions give a performance boost of close to 4x over native.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Panic when profiling AVX2 implementation

egonelbre opened this issue · comments

It seems there's a fairly repeatable panic with this code.

s:\deps\sha256-simd>go test -bench BenchmarkHash/AVX2/8Bytes -benchtime 1s -cpuprofile cpu.prof
goos: windows
goarch: amd64
pkg: github.com/minio/sha256-simd
BenchmarkHash/AVX2/8Bytes-32            fatal error: unexpected signal during runtime execution
[signal 0xc0000005 code=0x0 addr=0xc00031c008 pc=0xc6e92d]

runtime stack:
runtime.throw(0xd8d2bd, 0x2a)
        c:/go/src/runtime/panic.go:1116 +0x79
runtime.sigpanic()
        c:/go/src/runtime/signal_windows.go:240 +0x285
runtime.gentraceback(0xd3efed, 0xc00031bbc0, 0x0, 0xc000382d80, 0x0, 0x7128dfee90, 0x40, 0x0, 0x0, 0x6, ...)
        c:/go/src/runtime/traceback.go:251 +0x136d
runtime.sigprof(0xd3efed, 0xc00031bbc0, 0x0, 0xc000382d80, 0xc000088400)
        c:/go/src/runtime/proc.go:4041 +0x491
runtime.profilem(0xc000088400, 0x1cc)
        c:/go/src/runtime/os_windows.go:1105 +0xe9
runtime.profileloop1(0x0, 0x0)
        c:/go/src/runtime/os_windows.go:1152 +0x1a5
runtime.externalthreadhandler(0x7128dff7b8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        c:/go/src/runtime/sys_windows_amd64.s:267 +0x95

goroutine 1 [chan receive]:
testing.(*B).run1(0xc00030a240, 0xc00030a240)
        c:/go/src/testing/benchmark.go:233 +0xa6
testing.(*B).Run(0xc00030a000, 0xd8569b, 0xd, 0xd8fc40, 0xce9d00)
        c:/go/src/testing/benchmark.go:654 +0x36c
testing.runBenchmarks.func1(0xc00030a000)
        c:/go/src/testing/benchmark.go:534 +0x7f
testing.(*B).runN(0xc00030a000, 0x1)
        c:/go/src/testing/benchmark.go:191 +0xf2
testing.runBenchmarks(0xd89f45, 0x1c, 0xc0000046a0, 0xe77300, 0x8, 0x8, 0xe7bac0)
        c:/go/src/testing/benchmark.go:540 +0x3b5
testing.(*M).Run(0xc000130080, 0x0)
        c:/go/src/testing/testing.go:1363 +0x57d
main.main()
        _testmain.go:83 +0x145

goroutine 6 [sleep]:
time.Sleep(0x5f5e100)
        c:/go/src/runtime/time.go:188 +0xc9
runtime/pprof.profileWriter(0xdb3da0, 0xc000006030)
        c:/go/src/runtime/pprof/pprof.go:799 +0x72
created by runtime/pprof.StartCPUProfile
        c:/go/src/runtime/pprof/pprof.go:784 +0x127

goroutine 31 [chan receive]:
testing.(*B).doBench(0xc00030a480, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        c:/go/src/testing/benchmark.go:277 +0x7a
testing.(*benchContext).processBench(0xc0000046e0, 0xc00030a480)
        c:/go/src/testing/benchmark.go:570 +0x233
testing.(*B).run(0xc00030a480)
        c:/go/src/testing/benchmark.go:268 +0x6a
testing.(*B).Run(0xc00030a240, 0xc00000c3e0, 0xb, 0xc000078710, 0x0)
        c:/go/src/testing/benchmark.go:655 +0x42c
github.com/minio/sha256-simd.BenchmarkHash(0xc00030a240)
        s:/deps/sha256-simd/sha256_test.go:2330 +0x46d
testing.(*B).runN(0xc00030a240, 0x1)
        c:/go/src/testing/benchmark.go:191 +0xf2
testing.(*B).run1.func1(0xc00030a240)
        c:/go/src/testing/benchmark.go:231 +0x5e
created by testing.(*B).run1
        c:/go/src/testing/benchmark.go:224 +0x85

goroutine 82 [running]:
        goroutine running on other thread; stack unavailable
created by testing.(*B).doBench
        c:/go/src/testing/benchmark.go:276 +0x5c
exit status 2
FAIL    github.com/minio/sha256-simd    0.706s

This doesn't seem to happen with other implementation.

Also, you may need to retry it a few times before it crashes.

Currently we are unable to reproduce the issue in linux.

Note, this could also be a go runtime issue since there are some similar issues, such as:

This seems to also happen with Go tip.

If I increase blockAvx2 stack size to TEXT ·blockAvx2(SB),$2048-48 then it doesn't seem to fail any more.

I think this is because the TEXT ·blockAvx2(SB) code (at least) directly modifies and uses the SP and BP registers that the Go runtime inspects to generate a traceback. Since they aren't pointing at the top of the stack, it can end up walking into some strange places when asked to generate a traceback like during a profiling signal.

For example, if you insert a BYTE $0xCC instruction somewhere in the inner loop, you can observe the runtime having difficulty generating a traceback:

Large traceback
SIGTRAP: trace trap
PC=0x51d945 m=0 sigcode=128

goroutine 18 [running]:
runtime: unexpected return pc for github.com/minio/sha256-simd.blockAvx2 called from 0x2073612073692043
stack: frame={sp:0xc0000957c0, fp:0xc000095c10} stack=[0xc000094000,0xc000096000)
000000c0000956c0:  000000c0000956e8  0000000000468a53 <reflect.resolveNameOff+51> 
000000c0000956d0:  0000000000538c60  00000000000034b5 
000000c0000956e0:  00000000005294b5  000000c000095720 
000000c0000956f0:  000000000049e456 <reflect.(*rtype).String+54>  0000000000538c60 
000000c000095700:  00000000000034b5  00000000005294b5 
000000c000095710:  00000000005294b8  000000000000000a 
000000c000095720:  000000c0000958f8  00000000004bb5f1 <fmt.(*pp).printValue+881> 
000000c000095730:  000000c00010ac30  000000c000146fc0 
000000c000095740:  0000000000000020  0000000000000020 
000000c000095750:  0000000000000078  00000000005294b9 
000000c000095760:  0000000000000009  00007f833d41a500 
000000c000095770:  00007f833d60ffff  000000c0000957c0 
000000c000095780:  0000000000417d5e <runtime.(*mcentral).grow+318>  00007f833d41a400 
000000c000095790:  0020300000000000  00007f833d60ffff 
000000c0000957a0:  00007f833d2b25c0  0000000000591b60 
000000c0000957b0:  0000000000000001  00007f833d2b25c0 
000000c0000957c0: <000000c000095838  000000c000095810 
000000c0000957d0:  0000000000414b0f <runtime.heapBits.forwardOrBoundary+111>  00007f833d41a600 
000000c0000957e0:  000000000000001f  00007f833d60ffff 
000000c0000957f0:  0000000000000400  00007f833d41a700 
000000c000095800:  000000c000095950  000000c0000959d8 
000000c000095810:  000000c000095a18  000000c0000954d0 
000000c000095820:  000000c000095970  0000000000513fef <github.com/minio/sha256-simd.blockGeneric+79> 
000000c000095830:  0000000000538c60  0000000000000400 
000000c000095840:  9288f7b95d69345c  87723109958b90cb 
000000c000095850:  725d100cdc93e826  70d4a2f9c64895e2 
000000c000095860:  dfda16b1e41c0f8c  0000000000000040 
000000c000095870:  0000000080000000  0000000000000000 
000000c000095880:  0000000000000000  0000000000000000 
000000c000095890:  0000000000000000  0000000000000000 
000000c0000958a0:  0000000000000000  0000040000000000 
000000c0000958b0:  0280000180000000  0000011000205000 
000000c0000958c0:  00aa000022000800  c0002ac005089942 
000000c0000958d0:  1028c80a62080004  9f004823001a4055 
000000c0000958e0:  323b15b468ca269e  5b6835a31886f73d 
000000c0000958f0:  3311a7d237fd1798  55edccc1e8977a87 
000000c000095900:  1c1a75cd26785e65  70d975ed1898add6 
000000c000095910:  000000c000095970  0000000000516888 <github.com/minio/sha256-simd.blockAvx2Go+168> 
000000c000095920:  000000c000095950  0000000000000008 
000000c000095930:  0000000000000008  000000c0000959d8 
000000c000095940:  0000000000000040  0000000000000040 
000000c000095950:  bb67ae856a09e667  a54ff53a3c6ef372 
000000c000095960:  9b05688c510e527f  5be0cd191f83d9ab 
000000c000095970:  000000c0000959a0  0000000000513f3f <github.com/minio/sha256-simd.block+287> 
000000c000095980:  000000c000095a60  000000c0000959d8 
000000c000095990:  0000000000000040  0000000000000040 
000000c0000959a0:  000000c000095a18  0000000000513d09 <github.com/minio/sha256-simd.(*digest).checkSum+201> 
000000c0000959b0:  000000c000095a60  000000c0000959d8 
000000c0000959c0:  0000000000000040  0000000000000040 
000000c0000959d0:  0000000000000000  0000000000000080 
000000c0000959e0:  0000000000000000  0000000000000000 
000000c0000959f0:  0000000000000000  0000000000000000 
000000c000095a00:  0000000000000000  0000000000000000 
000000c000095a10:  0000000000000000  000000c000095ad0 
000000c000095a20:  0000000000513806 <github.com/minio/sha256-simd.Sum256+230>  000000c000095a60 
000000c000095a30:  0000000000000000  0000000000000000 
000000c000095a40:  0000000000000000  0000000000000000 
000000c000095a50:  0000000000000000  0000000000000000 
000000c000095a60:  bb67ae856a09e667  a54ff53a3c6ef372 
000000c000095a70:  9b05688c510e527f  5be0cd191f83d9ab 
000000c000095a80:  0000000000000000  0000000000000000 
000000c000095a90:  0000000000000000  0000000000000000 
000000c000095aa0:  0000000000000000  0000000000000000 
000000c000095ab0:  0000000000000000  0000000000000000 
000000c000095ac0:  0000000000000000  0000000000000000 
000000c000095ad0:  000000c000095f70  0000000000517037 <github.com/minio/sha256-simd.TestGolden+1591> 
000000c000095ae0:  000000c000095ba8  0000000000000000 
000000c000095af0:  0000000000000020  0000000000000000 
000000c000095b00:  0000000000000000  0000000000000000 
000000c000095b10:  0000000000000000  0100000000000000 
000000c000095b20:  0000000000000000  0000000000000000 
000000c000095b30:  0000000000000000  0000000000000000 
000000c000095b40:  0000000000000000  0000000000000000 
000000c000095b50:  0000000000000000  0000000000000040 
000000c000095b60:  0000000000000000  0000000000000000 
000000c000095b70:  0000000000000000  000000000000003f 
000000c000095b80:  0000000000000002  0000000000000000 
000000c000095b90:  0000000000000000  0000000000000000 
000000c000095ba0:  0000000000000000  0000000000000000 
000000c000095bb0:  0000000000000000  0000000000000000 
000000c000095bc0:  0000000000000000  0000000000000000 
000000c000095bd0:  0000000000000000  0000000000000000 
000000c000095be0:  0000000000000000  0000000000000000 
000000c000095bf0:  0000000000000000  0000000000000000 
000000c000095c00:  0000000000000000 !2073612073692043 
000000c000095c10: >656c626174726f70  6e6f745320736120 
000000c000095c20:  2121656764656865  2073612073692043 
000000c000095c30:  656c626174726f70  6e6f745320736120 
000000c000095c40:  2121656764656865  4bb7ff54e2f36ef1 
000000c000095c50:  8cef8694d9973c7e  d7fe6dbc804c9e54 
000000c000095c60:  cdfbf4767e5e8cfe  0000000000000000 
000000c000095c70:  0000000000000000  0000000000000000 
000000c000095c80:  0000000000000000  0000000000000000 
000000c000095c90:  0000000000000000  0000000000000000 
000000c000095ca0:  0000000000000000  0000000000000000 
000000c000095cb0:  0000000000000000  0000000000000000 
000000c000095cc0:  0000000000000000  0000000000000000 
000000c000095cd0:  0000000000000000  0000000000000000 
000000c000095ce0:  0000000000000000  0000000000000000 
000000c000095cf0:  0000000000000000  0000000000000000 
000000c000095d00:  0000000000000000  0000000000000000 
github.com/minio/sha256-simd.blockAvx2(0x656c626174726f70, 0x6e6f745320736120, 0x2121656764656865, 0x2073612073692043, 0x656c626174726f70, 0x6e6f745320736120)
	/home/jeff/tmp/sha256perf/sha256-simd/sha256blockAvx2_amd64.s:162 +0xc5 fp=0xc000095c10 sp=0xc0000957c0 pc=0x51d945
created by testing.(*T).Run
	/nix/store/2d5n5y9w2h2dr6r3v5h0qdf6p4i3s7is-go-1.15.2/src/testing/testing.go:1178 +0x386

goroutine 1 [chan receive]:
testing.(*T).Run(0xc000120300, 0x5627f4, 0xa, 0x56c3b8, 0x47ffc6)
	/nix/store/2d5n5y9w2h2dr6r3v5h0qdf6p4i3s7is-go-1.15.2/src/testing/testing.go:1179 +0x3ad
testing.runTests.func1(0xc000120180)
	/nix/store/2d5n5y9w2h2dr6r3v5h0qdf6p4i3s7is-go-1.15.2/src/testing/testing.go:1449 +0x78
testing.tRunner(0xc000120180, 0xc000110de0)
	/nix/store/2d5n5y9w2h2dr6r3v5h0qdf6p4i3s7is-go-1.15.2/src/testing/testing.go:1127 +0xef
testing.runTests(0xc000124060, 0x64e460, 0xe, 0xe, 0xbfd9dc57cbaacf9d, 0x8bb2cbe9e6, 0x652b20, 0x40d390)
	/nix/store/2d5n5y9w2h2dr6r3v5h0qdf6p4i3s7is-go-1.15.2/src/testing/testing.go:1447 +0x2e8
testing.(*M).Run(0xc000142000, 0x0)
	/nix/store/2d5n5y9w2h2dr6r3v5h0qdf6p4i3s7is-go-1.15.2/src/testing/testing.go:1357 +0x245
main.main()
	_testmain.go:85 +0x138

rax    0x6a09e667
rbx    0xbb67ae85
rcx    0x3c6ef372
rdx    0xa54ff53a
rdi    0xc000095950
rsi    0xc000095a18
rbp    0x592060
rsp    0xc0000957c0
r8     0x510e527f
r9     0x9b05688c
r10    0x1f83d9ab
r11    0x5be0cd19
r12    0xc0000957c0
r13    0x76f4fbcd
r14    0x0
r15    0xc0000959d8
rip    0x51d945
rflags 0x246
cs     0x33
fs     0x0
gs     0x0
exit status 2
FAIL	github.com/minio/sha256-simd	0.004s

The runtime has some protections to avoid problems caused by generating tracebacks during signals, but I don't think it can protect against this. I'm reaching the ends of my understanding, but I think the stack contents will be fairly random, and so if you're unlucky enough, you can fool the runtime into walking into some spot that is just right enough to trick it into crashing.

If I increase blockAvx2 stack size to TEXT ·blockAvx2(SB),$2048-48 then it doesn't seem to fail any more.

Just curious, by how much did you increase the stack size to not see the issue?

And you are only seeing this in Windows ?

@fwessels changed:

from:
TEXT ·blockAvx2(SB),$1088-48

to:
TEXT ·blockAvx2(SB),$2048-48

Note, I'm not certain that it is the proper fix, it might just hide the problem better as a result.

Yes, currently only seeing on Windows. We've had a weird crash on Linux as well, however unable to reproduce them -- or confirm that they are caused by sha256-simd.

Keith Randall mentioned that the issue is caused by playing with SP, not BP. Because the traceback code can't handle variable sized frames. golang/go#43496 (comment)

@fwessels Do you have the original assembly? It doesn't seem to have been committed here. It would be nice to be able to fix this without rewriting it.

I will remove modifications of SP, and just make the stack allocation static, maybe remove the stack alignment. I don't think it will matter on any CPU with AVX2 anyway.

#57 Will remove this code, since it is slower.

@klauspost I would have to dig real hard to come up with the original assembly code it was based on -- but this seems moot anyways regarding #57, so we will leave it at this