Ptx assembly aborted due to errors

Question

Ptx assembly aborted due to errors

carlosgalvezp opened this issue 2 years ago · comments

Hi!

We are bumping Clang to commit 1ae33bf, and we find that it crashes building CUDA code with this error trace:

ptxas /tmp/patch-4eaef1/patch-sm_61.s, line 3885; fatal   : Parsing error near '.': syntax error
ptxas fatal   : Ptx assembly aborted due to errors
clang: �[0;1;31merror: ptxas command failed with exit code 255 (use -v to see invocation)
clang version 16.0.0 (https://github.com/llvm/llvm-project.git 1ae33bf42680b156fe0f5cd6163bf24ef45d8cd3)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: external/llvm/usr/bin

Is this a known problem?

Carlos Galvez · Answer 1 · Thu Oct 20 2022 20:10:48 GMT+0800 (China Standard Time)

It has to do with the __func__ function, and only when compiling in debug mode -g. Minimal repro with CUDA 11.7 Clang trunk:

#include <cstdio>

__global__ void foo_kernel()
{
    printf("%s", __func__);
}

void foo()
{
    foo_kernel<<<10, 1>>>();
}

clang --cuda-path=/path/to/cuda-11.7 -c  -g --cuda-gpu-arch=sm_75 -o foo.cu.o foo.cu

Carlos Galvez · Answer 2 · Thu Oct 20 2022 22:27:37 GMT+0800 (China Standard Time)

Bisecting brings me here: 7aa1fa0
FYI @hctim @dwblaikie @rnk @adrian-prantl

Artem Belevich · Answer 3 · Fri Oct 21 2022 01:33:31 GMT+0800 (China Standard Time)

https://godbolt.org/z/8bMYcf1z7

The debug info directive that ptxas does not like is on line 655:

.b64 __func__._Z10foo_kernelv

It should've been __func___$__Z10foo_kernelv. Apparently NVPTX's name normalizer didn't get applied to the symbol name in debug info.

A work-around would be to disable GPU-side debug info with -Xarch_device -g0

Carlos Galvez · Answer 4 · Fri Oct 21 2022 14:23:24 GMT+0800 (China Standard Time)

Thanks for the quick help! Will try the workaround :)

Hao Luo · Answer 5 · Wed Jan 03 2024 17:02:15 GMT+0800 (China Standard Time)

https://godbolt.org/z/8bMYcf1z7

The debug info directive that ptxas does not like is on line 655:
.b64 __func__._Z10foo_kernelv
It should've been __func___$__Z10foo_kernelv. Apparently NVPTX's name normalizer didn't get applied to the symbol name in debug info.

A work-around would be to disable GPU-side debug info with -Xarch_device -g0

Is this issue solved? I am encountering this issue with clang and llvm 17.0.6

Jake Tuero · Answer 6 · Wed Mar 06 2024 11:26:17 GMT+0800 (China Standard Time)

I'm encountering a similar issue with clang version 18.0.0git (https://github.com/llvm/llvm-project.git b7376c319630a6b8395f3df5a46ba73e8fe29ea9), where debug builds fail when using __PRETTY_FUNCTION__

Raul Tambre · Answer 7 · Fri Mar 15 2024 20:30:58 GMT+0800 (China Standard Time)

Minimal repro:

echo '__attribute__((device)) void foo(){__PRETTY_FUNCTION__;}' | clang -cc1 -triple nvptx64-nvidia-cuda -S -fcuda-is-device -debug-info-kind=constructor -fno-dwarf-directory-asm -Wno-everything -x cuda | ptxas -

Removing -debug-info-kind=constructor works around this.

Artem Belevich · Answer 8 · Sat Mar 16 2024 01:31:33 GMT+0800 (China Standard Time)

It looks like another case of LLVM generating symbol names with a dot in it and sneaking through our attempts to normalize such names:

 .global .align 1 .b8 __PRETTY_FUNCTION___$__Z3foov[11] = {118, 111, 105, 100, 32, 102, 111, 111, 40, 41};
...

.b64 __PRETTY_FUNCTION__._Z3foov

The variable itself does have . mangled, but the reference from debug info does not.

Switching to line-only debug info would work around the issue, too.

Raul Tambre · Answer 9 · Tue Mar 19 2024 23:32:25 GMT+0800 (China Standard Time)

Looked into this quite a bit. It seems the name gets embedded in a debug DIE during the annotation-remarks pass in getOrCreateGlobalVariableDIE()→addLocationAttribute()→addOpAddress(). Somehow there end up being 2 MCSymbols related to __PRETTY_FUNCTION__._Z3foov and nvptx-assign-valid-global-names renames the general one, but not the one that was embedded into the DIE...

After spending already too much time looking into this and not understanding enough about the guts of the LLVM debug information infrastructure I took the easy way out:

Generate pre-defined lvalue names without dots

`.` should be converted to `_$_` by the nvptx-assign-valid-global-names pass as `ptxas` doesn't support dots.
But during the ASMPrinter initialization the global variable name gets embedded in a debug DIE.
There somehow end up being two different `MCSymbol`s for the global variable with only the main one being renamed.

Bug: https://github.com/llvm/llvm-project/issues/58491
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -3277,7 +3277,12 @@ LValue CodeGenFunction::EmitPredefinedLV
     FnName = FnName.substr(1);
   StringRef NameItems[] = {
       PredefinedExpr::getIdentKindName(E->getIdentKind()), FnName};
-  std::string GVName = llvm::join(NameItems, NameItems + 2, ".");
+  std::string GVName;
+  if (CGM.getLangOpts().CUDA && CGM.getLangOpts().CUDAIsDevice) {
+    GVName = llvm::join(NameItems, NameItems + 2, "_$_");
+  } else {
+    GVName = llvm::join(NameItems, NameItems + 2, ".");
+  }
   if (auto *BD = dyn_cast_or_null<BlockDecl>(CurCodeDecl)) {
     std::string Name = std::string(SL->getString());
     if (!Name.empty()) {

Artem Belevich · Answer 10 · Wed Mar 20 2024 01:22:41 GMT+0800 (China Standard Time)

I think we've dealt with a similar issue in the dwarf debug info before. Let me see if I can find it.

Artem Belevich · Answer 11 · Thu Mar 21 2024 04:43:54 GMT+0800 (China Standard Time)

I think I had 2e7e097 in mind, but it may not be helpful here as it was dealing with the concept of private prefixes. Here the symbol which causes the problem is a . used as a separator.

I believe we did discuss invalid symbol issues in the past, but I do not think it ever went anywhere.
E.g. the discussion on https://reviews.llvm.org/D40573 still seems to be somewhat relevant.
Especially this bit:

This is silly. This bug has been open for so long that nvidia could've just fixed their toolchain by now to accept dots in symbol names.

Back to figuring out how to fix this instance.

But during the ASMPrinter initialization the global variable name gets embedded in a debug DIE.
There somehow end up being two different MCSymbols for the global variable with only the main one being renamed.

Oh, well. Looks like we may need to do it the hard way and teach nvptx-assign-valid-global-names how to deal with the symbols in debug info. It would still be dealing with the consequences, but at least the mess would be contained in one place.

@alexey-bataev Would you happen to have any idea on what would be the best way to get DWARF's symbol references mangled the same way we mangle other symbols in NVPTX?

Alexey Bataev · Answer 12 · Thu Mar 21 2024 04:50:10 GMT+0800 (China Standard Time)

I always thought that we need to handle it in the frontend. But it is only my thought, feel free to discard it.

Artem Belevich · Answer 13 · Thu Mar 21 2024 05:20:59 GMT+0800 (China Standard Time)

Avoiding such symbols in the front-end is would avoid some of the issues (granted, including this one), but a symbol with a dot may materialize within LLVM itself. Granted, it may not happen often in practice. It's also possible that such symbol cloning would not be affected by this issue (e.g. if, unlike this case, debug info would point to the same MCSymbol for the cleaned up name).

Here are the options I see:

Get NVIDIA to change ptxas and allow a more sensible set of characters in identifiers. The problem is that it's not going to help us for a long time, as we need to deal with ptxas versions that are out there already.
Change LLVM to use something other than . when it needs to create identifiers. This has consequences for ABI. E.g. host/device symbols will get mangled differently. That would be a problem.
Because of the above, this name cleanup may need to be applied selectively on multiple targets (NVPTX + supported host architectures, currently x86 and ARM). E.g. we'll want to apply it to all symbols on the GPU, and to all symbols that need to have the same name across host/GPU boundary. E.g. kernels and other GPU-side symbols we may need to refer to from the host.
cleanup the names in the front-end. This is a very narrow workaround for a subset of these 'illegal character' issues. Should be enough to deal with this case, but I do not like it because it's not front-end's job to know about the quirks of something many abstraction levels below it. Front-end should be contrained by the contract between it and LLVM. If the symbol is valid for LLVM, how it gets lowered into target assembly is LLVM's responsibility.
Teach nvptx-assign-valid-global-names how to fix symbol names in associated debug info. I think that may be the least bad trade-off we may have at the moment. The caveat is that I have no idea how much effort it would take.

@dwblaikie If we rename a global symbol how hard is that to find and update references to the symbol from debug info. I suspect we already do that somewhere in LLVM. Can you point me in the right direction?

David Blaikie · Answer 14 · Thu Mar 21 2024 09:16:20 GMT+0800 (China Standard Time)

Not sure if existing instances of this (as you say, abi would mostly make it impossible to change symbol names effectively)

But if you want to try it - the disubprogram attached to the function, if it has the mangled name (maybe it doesn't, maybe it just depends on the actual symbol name of the llvm::function in which case you wouldn't have to do anything for debuginfo) - that should be updated.

Raul Tambre · Answer 15 · Thu Mar 21 2024 20:23:30 GMT+0800 (China Standard Time)

The DISubprogram name referred to the correct MCSymbol* and was correct AFAIK. The problematic name instead seemed to be attached to the !17 debug annotation on the ret.

source_filename = "-"
target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

@"__PRETTY_FUNCTION___$__Z3foov" = private unnamed_addr constant [11 x i8] c"void foo()\00", align 1, !dbg !0

; Function Attrs: convergent mustprogress noinline nounwind optnone
define dso_local void @_Z3foov() #0 !dbg !14 {
entry:
  ret void, !dbg !17
}

attributes #0 = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+ptx32" }

!llvm.dbg.cu = !{!8}
!llvm.linker.options = !{}
!llvm.module.flags = !{!10, !11, !12}
!llvm.ident = !{!13}

!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())
!1 = distinct !DIGlobalVariable(scope: null, file: !2, line: 1, type: !3, isLocal: true, isDefinition: true)
!2 = !DIFile(filename: "<stdin>", directory: "/home/raul.tambre")
!3 = !DICompositeType(tag: DW_TAG_array_type, baseType: !4, size: 88, elements: !6)
!4 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !5)
!5 = !DIBasicType(name: "char", size: 8, encoding: DW_ATE_signed_char)
!6 = !{!7}
!7 = !DISubrange(count: 11)
!8 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !2, producer: "Clebian clang version 19.0.0", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, globals: !9, splitDebugInlining: false, nameTableKind: None)
!9 = !{!0}
!10 = !{i32 2, !"Debug Info Version", i32 3}
!11 = !{i32 1, !"wchar_size", i32 4}
!12 = !{i32 4, !"nvvm-reflect-ftz", i32 0}
!13 = !{!"Clebian clang version 19.0.0"}
!14 = distinct !DISubprogram(name: "foo", linkageName: "_Z3foov", scope: !2, file: !2, line: 1, type: !15, scopeLine: 1, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !8)
!15 = !DISubroutineType(types: !16)
!16 = !{null}
!17 = !DILocation(line: 1, column: 56, scope: !14)

I managed to write something to reach that instruction, but not how to reach the DIE that had the wrong name embedded already. Seems to be an abstraction layer away and inaccessible in such a pass. It seemed having nvptx-assign-valid-global-names run as one the first passes before the DIE is created might work.

David Blaikie · Answer 16 · Fri Mar 22 2024 00:45:19 GMT+0800 (China Standard Time)

Sorry, I'm not following that last comment - the DISubprogram is the same one from the Function and from the DILocation.

I take it this renaming isn't done at the IR level, OK - so it's not about updating the DISubprogram itself to match a change to the Function, but later than that.

Sure enough then - DwarfUnit::applySubprogramDefinitionAttributes calls addLinkageName - I guess it'd need some awkward mapping in DwarfDebug of DISubprogram back to llvm::Function... I don't feel good about that, maybe there's some other way to handle it, but you could at least prototype that.

Raul Tambre · Answer 17 · Fri Mar 22 2024 06:19:50 GMT+0800 (China Standard Time)

Sorry, I'm not following that last comment - the DISubprogram is the same one from the Function and from the DILocation.

I guess was aiming at that you can't get the DILocation from the DISubprogram, but rather have to iterate the instructions to find the return instruction with the appropriate debug annotation. At least it seemed to me so, but chewing through the API and abstractions was difficult when I did try. 🙂

David Blaikie · Answer 18 · Tue Mar 26 2024 01:00:14 GMT+0800 (China Standard Time)

Sorry, I'm not following that last comment - the DISubprogram is the same one from the Function and from the DILocation.

I guess was aiming at that you can't get the DILocation from the DISubprogram, but rather have to iterate the instructions to find the return instruction with the appropriate debug annotation. At least it seemed to me so, but chewing through the API and abstractions was difficult when I did try. 🙂

Ah, yes, DILocations aren't accesible top-down from the DISubprogram, only bottom-up from the DISubprogram's Function's instructions.