cdl-saarland / rv

RV: A Unified Region Vectorizer for LLVM

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[VA] [VEC] [LIN] function pointers

Commaster opened this issue · comments

I tried converting one of our samples (https://zivgitlab.uni-muenster.de/HPC2SE-Project/pacxx-samples/tree/master/pointer_function) to your test suite structure.

verify___.cpp:

#include <stdlib.h>
#include <stdio.h>
#include <iostream>

#include <cassert>
#include <random>

#include "launcherTools.h"

extern "C" void foo(int * threadId, int * b, int n);
extern "C" void foo_SIMD(int * threadId, int * b, int n);

int main(int argc, char ** argv) {
	const uint numInputs = 8;

	int threadId[numInputs], expected_b[numInputs];

	for (unsigned i = 0; i < numInputs; ++i) {
		threadId[i] = i;
		expected_b[i] = 0;
		b[i] = 0;
	}

	foo(&threadId, &expected_b);
	foo_SIMD(&threadId, &b);

	size_t hash = hashArray(expected_b, numInputs, 0);
	hash = hashArray(b, numInputs, hash);

	std::cerr << hash << "\n";
	return 0;
}

test___.cpp:

// Shapes: ?_?_?, LaunchCode: functionpointers

int mult(int x)
{
	return 7*x+1;
}

int dual(int x)
{
	return 2*x+1;
}

int beep(int x)
{
	return -x;
}

int sum(int x)
{
	return x+5;
}

extern "C" void
foo(int * threadId, int * b, int n)
{
	for (int i = 0; i < n; i++) {
		int (*funptr)(int);
		switch (threadId[i]%8)
		{
			case 0: funptr = &mult;
				break;
			case 1: funptr = &dual;
				break;
			case 2: funptr = &beep;
				break;
			case 3: funptr = &sum;
				break;
			case 4: funptr = &dual;
				break;
			case 5: funptr = &mult;
				break;
			case 6: funptr = &sum;
				break;
			case 7: funptr = &beep;
				break;
		}
		b[i] = funptr(threadId[i]);
	}
}

Both the test_rv (I tried several shape values)

-- End of Recurrence Analysis --
Extracting a scalar value from a vector:
Original Value:   %0 = load i32, i32* %arrayidx, align 4, !tbaa !2Vector Value:   %cont_load = load <8 x i32>, <8 x i32>* %vec_cast, align 4Extracting a scalar value from a vector:
Original Value:   %0 = load i32, i32* %arrayidx, align 4, !tbaa !2Vector Value:   %cont_load = load <8 x i32>, <8 x i32>* %vec_cast, align 4Extracting a scalar value from a vector:
Original Value:   %0 = load i32, i32* %arrayidx, align 4, !tbaa !2Vector Value:   %cont_load = load <8 x i32>, <8 x i32>* %vec_cast, align 4Extracting a scalar value from a vector:
Original Value:   %0 = load i32, i32* %arrayidx, align 4, !tbaa !2Vector Value:   %cont_load = load <8 x i32>, <8 x i32>* %vec_cast, align 4Extracting a scalar value from a vector:
Original Value:   %0 = load i32, i32* %arrayidx, align 4, !tbaa !2Vector Value:   %cont_load = load <8 x i32>, <8 x i32>* %vec_cast, align 4Extracting a scalar value from a vector:
Original Value:   %0 = load i32, i32* %arrayidx, align 4, !tbaa !2Vector Value:   %cont_load = load <8 x i32>, <8 x i32>* %vec_cast, align 4Extracting a scalar value from a vector:
Original Value:   %0 = load i32, i32* %arrayidx, align 4, !tbaa !2Vector Value:   %cont_load = load <8 x i32>, <8 x i32>* %vec_cast, align 4Extracting a scalar value from a vector:
Original Value:   %0 = load i32, i32* %arrayidx, align 4, !tbaa !2Vector Value:   %cont_load = load <8 x i32>, <8 x i32>* %vec_cast, align 4Extracting a scalar value from a vector:
Original Value:   %call = tail call i32 %switch.load.R.b(i32 %0)Vector Value:   %scalarized21 = insertelement <8 x i32> %scalarized18, i32 %call20, i64 7loopHead: 0: shape unired: 
loopHead: 0: shape varyingred: 
rvTool: %%%/llvm/include/llvm/IR/DataLayout.h:531: uint64_t llvm::DataLayout::getTypeSizeInBits(llvm::Type*) const: Assertion `Ty->isSized() && "Cannot getTypeInfo() on a type that is unsized!"' failed.

Program received signal SIGABRT, Aborted.
0x00007ffff5939428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff5939428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007ffff593b02a in __GI_abort () at abort.c:89
#2  0x00007ffff5931bd7 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x7ffff79d54d0 "Ty->isSized() && \"Cannot getTypeInfo() on a type that is unsized!\"", 
    file=file@entry=0x7ffff79d5490 "%%%/llvm/include/llvm/IR/DataLayout.h", line=line@entry=531, 
    function=function@entry=0x7ffff79d6ee0 <llvm::DataLayout::getTypeSizeInBits(llvm::Type*) const::__PRETTY_FUNCTION__> "uint64_t llvm::DataLayout::getTypeSizeInBits(llvm::Type*) const") at assert.c:92
#3  0x00007ffff5931c82 in __GI___assert_fail (assertion=0x7ffff79d54d0 "Ty->isSized() && \"Cannot getTypeInfo() on a type that is unsized!\"", 
    file=0x7ffff79d5490 "%%%/llvm/include/llvm/IR/DataLayout.h", line=531, 
    function=0x7ffff79d6ee0 <llvm::DataLayout::getTypeSizeInBits(llvm::Type*) const::__PRETTY_FUNCTION__> "uint64_t llvm::DataLayout::getTypeSizeInBits(llvm::Type*) const") at assert.c:101
#4  0x00007ffff7913c41 in llvm::DataLayout::getTypeSizeInBits(llvm::Type*) const () from %%%/lib/libRV.so
#5  0x00007ffff792a989 in rv::NatBuilder::widenScalar(llvm::Value&, rv::VectorShape) () from %%%/lib/libRV.so
#6  0x00007ffff7934e22 in rv::NatBuilder::requestVectorValue(llvm::Value*) () from %%%/lib/libRV.so
#7  0x00007ffff793c5b8 in rv::NatBuilder::addValuesToPHINodes() () from %%%/lib/libRV.so
#8  0x00007ffff793deb3 in rv::NatBuilder::vectorize(bool, llvm::ValueMap<llvm::Value const*, llvm::WeakTrackingVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >*) ()
   from %%%/lib/libRV.so

and our pipeline crash in vectorize.

-- End of Recurrence Analysis --
functionpointers: %%%/llvm/include/llvm/IR/DataLayout.h:531: uint64_t llvm::DataLayout::getTypeSizeInBits(llvm::Type*) const: Assertion `Ty->isSized() && "Cannot getTypeInfo() on a type that is unsized!"' failed.

Program received signal SIGABRT, Aborted.
0x00007fffe0af6428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007fffe0af6428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007fffe0af802a in __GI_abort () at abort.c:89
#2  0x00007fffe0aeebd7 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x7ffff74894d0 "Ty->isSized() && \"Cannot getTypeInfo() on a type that is unsized!\"", 
    file=file@entry=0x7ffff7489490 "%%%/llvm/include/llvm/IR/DataLayout.h", line=line@entry=531, 
    function=function@entry=0x7ffff748aee0 <llvm::DataLayout::getTypeSizeInBits(llvm::Type*) const::__PRETTY_FUNCTION__> "uint64_t llvm::DataLayout::getTypeSizeInBits(llvm::Type*) const") at assert.c:92
#3  0x00007fffe0aeec82 in __GI___assert_fail (assertion=0x7ffff74894d0 "Ty->isSized() && \"Cannot getTypeInfo() on a type that is unsized!\"", 
    file=0x7ffff7489490 "%%%/llvm/include/llvm/IR/DataLayout.h", line=531, 
    function=0x7ffff748aee0 <llvm::DataLayout::getTypeSizeInBits(llvm::Type*) const::__PRETTY_FUNCTION__> "uint64_t llvm::DataLayout::getTypeSizeInBits(llvm::Type*) const") at assert.c:101
#4  0x00007ffff73c7c41 in llvm::DataLayout::getTypeSizeInBits(llvm::Type*) const () from %%%/lib/libRV.so
#5  0x00007ffff73de989 in rv::NatBuilder::widenScalar(llvm::Value&, rv::VectorShape) () from %%%/lib/libRV.so
#6  0x00007ffff73e8e22 in rv::NatBuilder::requestVectorValue(llvm::Value*) () from %%%/lib/libRV.so
#7  0x00007ffff73ec564 in rv::NatBuilder::mapOperandsInto(llvm::Instruction*, llvm::Instruction*, bool, unsigned int) () from %%%/lib/libRV.so
#8  0x00007ffff73ec9d2 in rv::NatBuilder::vectorizeInstruction(llvm::Instruction*) () from %%%/lib/libRV.so
#9  0x00007ffff73f0e5b in rv::NatBuilder::vectorize(llvm::BasicBlock*, llvm::BasicBlock*) () from %%%/lib/libRV.so
#10 0x00007ffff73f27b1 in rv::NatBuilder::vectorize(bool, llvm::ValueMap<llvm::Value const*, llvm::WeakTrackingVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >*) ()
   from %%%/lib/libRV.so

Is it possible to add support for function pointers to RV?
Or at least a hint on where to look so I could work on a PR.

Yes, it is possible (and this bug should be fixed). There are some subtleties when dealing with varying function pointers and i don't recommend it.

Function pointers currently crash because we assume that all varying values have a sized type. That should be fixed in NatBuilder by scalarizing if the type is unsized. That will be inefficient because it means that the function calls will be scalarized as well.

If you want to make varying function pointers efficient, you should add an optimization that re-writes that code into:

switch (threadId[i]%8)
{
  case 0:
  case 5:
    b[i] = mult(threadId[i]); break;

  case 1:
  case 4:
    b[i] = dual(threadId[i]); break;

  case 2:
  case 7:
    b[i] = beep(threadId[i]); break;

  case 3:
  case 6:
    b[i] = sum(threadId[i]); break;
}

You'd want those functions and function calls to be vectorized. That requires two things which are only half way there in RV, atm:
1.) predicated vector function calls (only the codegen part is missing).
2.) whole-function vectorization with a non-trivial entry mask (that code path is implemented but untested).

I see two modifications to the switch that you suggest:

  • grouping:

Not sure if this would apply, since by the time RV receives the module this switch has been transfromed into a phi and a globalValue lookup table.

  • replacing pointers with direct calls:

This would be impossible in the target use case: The function pointers will be stored in a large matrix, generated during runtime.

I would appreciate if you introduce a fallback (inside NatBuilder ?), so that the pass would finish instead of crashing.

I couldn't reproduce this bug (see also test_074 in commit 7149172). Can you provide a full test case (input file, rvTool cmd line)?
Also make sure that you initialize funcptr before the switch (eg nullptr) or you will end up getting an unintended loop carried function ptr in the i-loop.

I used your test_rv.py as described in the test suite. It does say it's passed, until you pass RVT_DEBUG=ON and see that the program doesn't actually get executed, last line is rvTool call. (I tried both -loop and -wfv tests). Then I just copy the rvTool line RVT_DEBUG printed and ran it by hand to see the crash.

The source in my first message is the complete test case, which I used with test_rv.py.

Well, there is a major difference between -loop (loop vectorization) and -wfv (whole-function vectorization). Judging by the structure of the code, i guess you want the former.
Let me know whether test_074 achieves what you want to do. It is based on this report and passes in test_rv.

In this particular case, -loop mode does work.
But in general case, this loop might be just a part of the foo-function, so we can't rely on -loop mode and have to use -wfv mode (For example with C_C_T shape [I still can't find a proper manual on vector shapes]):

- test_074_fptr-wfv.cpp
CMD clang++ -std=c++14 -march=native -m64 -O2 -fno-vectorize suite/test_074_fptr-wfv.cpp -fno-unroll-loops -S -emit-llvm -c -o build/test_074_fptr-wfv.ll
CMD rvTool -wfv -lower -i build/test_074_fptr-wfv.ll -o build/test_074_fptr-wfv.wfv.ll -k foo -s C_C_T -w 8

rvTool -wfv -lower -i build/test_074_fptr-wfv.ll -o build/test_074_fptr-wfv.wfv.ll -k foo -s C_C_T -w 8
passed

I can re-produce the crash now (with a different stack trace though). NatBuilder currently assumes that the function pointer to a CallInst is a global value.

Good point about the documentation. There are now two wiki pages on that test_rv: functional tester and Vector shape.

Varying function pointers are fixed by 4786bb1. The commit also includes a valid WFV test case (and launcher).
Does this solve the issue?

P.S. test_rv.py passes perfectly.

When I set the initial value of funcptr to nullptr this is the function IR (It does look better than uninitialized funcptr):

VectorizationInfo for Region FunctionRegion (_ZN5pacxx2v213genericKernelIZL21test_pointer_functioniPPcE3$_0EEvT_PPKc.vectorizer.tmp)

Arguments:
i32* %callable.coerce : uni
i8** %name : uni

Block %entry, predicate null
  %0 = call i32 @llvm.pacxx.read.ntid.x() #4 : uni
  %1 = call i32 @llvm.pacxx.read.ctaid.x() #4 : uni
  %mul.i.i.i.i = mul nsw i32 %1, %0 : unknown shape
  %2 = call i32 @llvm.pacxx.read.tid.x() #4 : cont
  %add.i.i.i.i = add nsw i32 %mul.i.i.i.i, %2 : unknown shape
  %rem.i.i = srem i32 %add.i.i.i.i, 8 : unknown shape
  %3 = sext i32 %rem.i.i to i64 : unknown shape
  %switch.gep.i = getelementptr inbounds [8 x i32 (i32)*], [8 x i32 (i32)*]* @"switch.table._ZN5pacxx2v210kernelBodyIRZL21test_pointer_functioniPPcE3$_0EEvOT_", i64 0, i64 %3 : unknown shape
  %switch.load.i = load i32 (i32)*, i32 (i32)** %switch.gep.i, align 8 : unknown shape
  %call9.i.i = call i32 %switch.load.i(i32 %add.i.i.i.i) #4, !callees !6 : unknown shape
  %idxprom.i.i = sext i32 %add.i.i.i.i to i64 : unknown shape
  %arrayidx.i.i = getelementptr inbounds i32, i32* %callable.coerce, i64 %idxprom.i.i : unknown shape
  store i32 %call9.i.i, i32* %arrayidx.i.i, align 4, !tbaa !7 : unknown shape
  ret void : unknown shape
}

I crash in https://github.com/cdl-saarland/rv/blob/develop/src/analysis/AllocaSSA.cpp#L192-L193:
Instruction %call9.i.i = call i32 %switch.load.i(i32 %add.i.i.i.i) #4, !callees !6

#0 in rv::AllocaSSA::compute()
#1 in rv::VectorizationAnalysis::VectorizationAnalysis(rv::Config, rv::PlatformInfo&, rv::VectorizationInfo&, llvm::DominatorTree const&, llvm::PostDominatorTree const&, llvm::LoopInfo const&)
#2 in rv::VectorizerInterface::analyze(rv::VectorizationInfo&, llvm::DominatorTree const&, llvm::PostDominatorTree const&, llvm::LoopInfo const&)

Because calledValue doesn't cast to a Function.
Converting https://github.com/cdl-saarland/rv/blob/develop/src/analysis/AllocaSSA.cpp#L192 to an if surrounding the following for loop allows it to finish and work correctly (as far as the results are concerned).

If you agree to add that if (I'm not sure if we should check the GlobalValue operands for pointerType [true] and !onlyReadMemory [should be false in this case, right?]), I'd declare this issue solved. 👍

P.P.S. setting funcptr to anything besides nullptr (uninitialized or set to one of the functions) still crashes the same way as in the first message. I'll just keep this as an advice to myself to always set it to nullptr

The AllocaSSA crash should be taken care of by db6392c..
This also fixes the issue that AllocaSSA put the Arguments of the callee in the written pointer set (and not the call arg operands as it should be).

The funcPtr initialization issue is resolved now in the WFV case by 855654b . The underlying issue is that the uninitialized funcPtr results in a function pointer recurrence in the i-loop. We need to improve the recurrence CodeGen in RV to also handle this in the outer-loop case (separate issue).

But, actually, even with uninitialized funcPtr there is no need to have a recurrence in the IR! In C/C++ terms, the lifetime of the funcPtr variable ends with the loop body {..}. It is not technically wrong to emit a recurrence here (funcPtr is uninitialized). However, i think it is bad judgement by Clang/LLVM to artificially increase the live range of funcPtr.

Ok. So, i looked this up in the C++ standard (N3797 draft) ;-)
6.5.3 The for statement
6.5.1 The while statement
--> for-loops are equivalent to while loops with a goto out of the block scope that contains the declaration of funcPtr.
3.7.3 Automatic storage duration
-> funcPtr has automatic storage duration
6.7 Declaration statement
(also 6.6 Jump statements)
-> funcPtr is destroyed on exit from its block via goto.

https://github.com/cdl-saarland/rv/blob/develop/src/analysis/AllocaSSA.cpp#L178 should say getCalledValue, not getCalledFunction.

Close enough 👍

getCalledFunction already does a dyn_cast internally. Cleaned up in 726f7ca .

getCalledFunction returned nullptr and caused an llvm dyn_cast assert.
Now it works. Thank you.