HPCToolkit / hpctoolkit

HPCToolkit performance tools: measurement and analysis components

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

hpcstruct fails for quicksilver GPU binary if using nvdisasm 11.5 or 11.6

jmellorcrummey opened this issue · comments

“hpcstruct --gpucfg yes” fails for the quicksilver databases when using nvdisasm from cuda 11.5 or 11.6. The output for nvdisasm changed from in cuda 11.5+. The only thing that seems to have changed is the form of the labels. I computed the diff with a filemerge tool on my Mac; below is a prefix of the differences.

image

I looked at the code that parses the graphs in banal/gpu/GraphReader.cpp. I didn’t see any pattern matching on label names. It isn’t clear to me why this should make a difference. Running hpcstruct using a new nvdisasm (even for binaries built with 11.2) causes hpcstruct to SEGV. The symptom in hpcstruct is that when linking fallthrough edges, a target has a NULL basic block. It seems like labels and blocks are not being matched properly. It would seem that the new label naming must be the cause since that is the only thing that changed. Here is a stack trace to where things go wrong ingesting the new CFGs.
Note that after the callstack target->block == 0x0.
(gdb) where
#0 0x0000000000468199 in GPUParse::CudaCFGParser::link_fallthrough_edges (this=0x7ffdd14e0910, graph=...,
blocks=std::vector of length 6, capacity 8 = {...}, block_id_map=std::unordered_map with 6 elements = {...})
at ../../../../src/lib/banal/gpu/CudaCFGParser.cpp:364
#1 0x000000000046783d in GPUParse::CudaCFGParser::parse (this=0x7ffdd14e0910, graph=..., functions=std::vector of length 0, capacity 0)
at ../../../../src/lib/banal/gpu/CudaCFGParser.cpp:272
#2 0x000000000044a549 in parseDotCFG (search_path=“.”,
elf_filename=“/home/johnmc/hpctoolkit-tutorial-examples/examples/gpu/quicksilver/hpctoolkit-qs-gpu-cuda.m/gpubins/984367e52bbd259fad8bcfa34ee8841c.gpubin”, dot_filename=“2236425.dot”, cubin=“2236425", cuda_arch=80, the_symtab=0x1b893e0,
functions=std::vector of length 0, capacity 0) at ../../../../src/lib/banal/gpu/ReadCudaCFG.cpp:161
#3 0x000000000044bcd7 in readCudaCFG (search_path=“.”, elfFile=0x1b7ebd0, the_symtab=0x1b893e0, cfg_wanted=true, code_src=0x7ffdd14e0f38,
code_obj=0x7ffdd14e0f30) at ../../../../src/lib/banal/gpu/ReadCudaCFG.cpp:354
#4 0x00000000004163f3 in BAnal::Struct::makeStructure (
filename=“/home/johnmc/hpctoolkit-tutorial-examples/examples/gpu/quicksilver/hpctoolkit-qs-gpu-cuda.m/gpubins/984367e52bbd259fad8bcfa34ee8841c.gpubin”, outFile=0x1b41280, gapsFile=0x0, gaps_filenm=“”, search_path=“.”, structOpts=...) at ../../../../src/lib/banal/Struct.cpp:661
#5 0x0000000000412d68 in singleApplicationBinary (args=..., opts=...) at ../../../../src/tool/hpcstruct/main.cpp:390
#6 0x0000000000413203 in realmain (argc=10, argv=0x7ffdd14e1a08) at ../../../../src/tool/hpcstruct/main.cpp:463
#7 0x0000000000412562 in main (argc=10, argv=0x7ffdd14e1a08) at ../../../../src/tool/hpcstruct/main.cpp:243
(gdb) up
#1 0x000000000046783d in GPUParse::CudaCFGParser::parse (this=0x7ffdd14e0910, graph=..., functions=std::vector of length 0, capacity 0)
at ../../../../src/lib/banal/gpu/CudaCFGParser.cpp:272
272 link_fallthrough_edges(graph, blocks, block_id_map);
(gdb) down
#0 0x0000000000468199 in GPUParse::CudaCFGParser::link_fallthrough_edges (this=0x7ffdd14e0910, graph=...,
blocks=std::vector of length 6, capacity 8 = {...}, block_id_map=std::unordered_map with 6 elements = {...})
at ../../../../src/lib/banal/gpu/CudaCFGParser.cpp:364
364 auto target_id = target->block->id;
(gdb) p *target
$3 = {inst = 0x1df0ee0, block = 0x0, type = Dyninst::ParseAPI::COND_TAKEN}

Closed with PR. Tested by JMC.