StarLord does not work with the IBM compiler on GPUs
maxpkatz opened this issue · comments
Max Katz commented
It compiles, but completely fails at runtime. Need to investigate why.
Max Katz commented
One issue is that by default we're requesting up to 512 threads per threadblock, but with the default maximum of 255 registers per thread, this is too many registers per threadblock. For PGI we deal with this by explicitly limiting to 128 registers per kernel in the StarLord makefile. Need to do the equivalent for IBM.
Max Katz commented
According to IBM, this can be done with:
-Xptxas -maxrregcount=128