owensgroup / merge-spmm

Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bugs about reading symmetric mtx data and processing data

YangWang92 opened this issue · comments

Hi all, I found a bug in reading symmetric mtx data.
For example, when I tried to run

./gspmm --debug=true --max_ncols=4 ./4_4coo_dense.mtx

to read 4x4 dense matrix from 4_4coo_dense.mtx.

It will load a broken matrix from the symmetric mtx.

Wrong results

%%MatrixMarket matrix coordinate real symmetric
%
4 4 10
1 1 1
2 1 1
2 2 1
3 1 1
3 2 1
3 3 1
4 1 1
4 2 1
4 3 1
4 4 1

ta: 32
tb: 32
nt: 128
row: 1
debug: 1
%%MatrixMarket matrix coordinate real symmetric
4 4 13
csrColInd:
[0]:0 [1]:1 [2]:2 [3]:3 [4]:0 [5]:2 [6]:3 [7]:0 [8]:1 [9]:3 [10]:0 [11]:1 [12]:2 [13]:0 [14]:4113 [15]:0 [16]:0 [17]:0 [18]:0 [19]:0 [20]:0 [21]:0 [22]:0 [23]:0 [24]:0 [25]:0 [26]:0 [27]:0 [28]:0 [29]:0 [30]:0 [31]:0 [32]:0 [33]:0 [34]:0 [35]:0 [36]:0 [37]:0 [38]:0 [39]:0
csrRowPtr:
[0]:0 [1]:4 [2]:7 [3]:10 [4]:13 [5]:14 [6]:81 [7]:0 [8]:0 [9]:0 [10]:0 [11]:0 [12]:1 [13]:1 [14]:1 [15]:2 [16]:2 [17]:2 [18]:3 [19]:3 [20]:3 [21]:35143 [22]:3 [23]:3 [24]:35143 [25]:-1456 [26]:81 [27]:0 [28]:0 [29]:1 [30]:2 [31]:3 [32]:0 [33]:2 [34]:3 [35]:0 [36]:1 [37]:3 [38]:0 [39]:1
csrVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:1 [6]:1 [7]:1 [8]:1 [9]:1 [10]:1 [11]:1 [12]:1 [13]:1 [14]:9.10844e-44 [15]:0 [16]:1.51901e-38 [17]:0 [18]:1.49695e-38 [19]:0 [20]:2.69808e-38 [21]:0 [22]:0 [23]:1.44118e+17 [24]:1.05553e+14 [25]:4.58715e-41 [26]:2.93874e-39 [27]:0 [28]:0 [29]:0 [30]:2.03188e-43 [31]:0 [32]:1.23145e+14 [33]:4.58715e-41 [34]:2.93874e-38 [35]:0 [36]:0 [37]:0 [38]:0 [39]:0
pretty print:
x x x x
x 0 x x
x x 0 x
x x x 0
mxm: 0.036416 ms
denseVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:0 [6]:1 [7]:1 [8]:1 [9]:1 [10]:0 [11]:1 [12]:1 [13]:1 [14]:1 [15]:0
x x x x
x 0 x x
x x 0 x
x x x 0
There were 0 errors out of 13.

Correct Results

%%MatrixMarket matrix coordinate real general
%
4 4 16
1 1 1
1 2 1
1 3 1
1 4 1
2 1 1
2 2 1
2 3 1
2 4 1
3 1 1
3 2 1
3 3 1
3 4 1
4 1 1
4 2 1
4 3 1
4 4 1

ta: 32
tb: 32
nt: 128
row: 1
debug: 1
%%MatrixMarket matrix coordinate real general
4 4 16
csrColInd:
[0]:0 [1]:1 [2]:2 [3]:3 [4]:0 [5]:1 [6]:2 [7]:3 [8]:0 [9]:1 [10]:2 [11]:3 [12]:0 [13]:1 [14]:2 [15]:3 [16]:35143 [17]:96 [18]:81 [19]:0 [20]:1065353216 [21]:1065353216 [22]:1065353216 [23]:1065353216 [24]:1065353216 [25]:1065353216 [26]:1065353216 [27]:1065353216 [28]:1065353216 [29]:1065353216 [30]:1065353216 [31]:1065353216 [32]:1065353216 [33]:1065353216 [34]:1065353216 [35]:1065353216 [36]:4 [37]:1065353216 [38]:81 [39]:0
csrRowPtr:
[0]:0 [1]:4 [2]:8 [3]:12 [4]:16 [5]:0 [6]:33 [7]:0 [8]:0 [9]:0 [10]:4482352 [11]:0 [12]:-1672595478 [13]:7953 [14]:33 [15]:0 [16]:42126000 [17]:0 [18]:42126096 [19]:0 [20]:32 [21]:0 [22]:49 [23]:0 [24]:0 [25]:0 [26]:34833056 [27]:0 [28]:42189248 [29]:0 [30]:1638970554 [31]:1868983913 [32]:203358240 [33]:7967 [34]:49 [35]:0 [36]:0 [37]:0 [38]:41772304 [39]:0
csrVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:1 [6]:1 [7]:1 [8]:1 [9]:1 [10]:1 [11]:1 [12]:1 [13]:1 [14]:1 [15]:1 [16]:5.60519e-45 [17]:1 [18]:1.13505e-43 [19]:0 [20]:0 [21]:0 [22]:1 [23]:1 [24]:1 [25]:1 [26]:1 [27]:1 [28]:1 [29]:1 [30]:1 [31]:1 [32]:1 [33]:1 [34]:1 [35]:1 [36]:2.95797e+17 [37]:4.58631e-41 [38]:2.70451e-43 [39]:0
pretty print:
x x x x
x x x x
x x x x
x x x x
mxm: 0.033792 ms
denseVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:1 [6]:1 [7]:1 [8]:1 [9]:1 [10]:1 [11]:1 [12]:1 [13]:1 [14]:1 [15]:1
x x x x
x x x x
x x x x
x x x x
There were 0 errors out of 16.

Hi @wyatuestc, I tried on the latest commit from master (e37dea0) and I was unable to reproduce the error you got. Here's what I got:

ctcyang@mario:~/merge-spmm/build$ bin/gspmm --debug=true --max_ncols=4 ../4_4coo_dense.mtx
ta:    32
tb:    32
nt:    128
row:   1
debug: 1
%%MatrixMarket matrix coordinate real general
4 4 16
csrColInd:
[0]:0 [1]:1 [2]:2 [3]:3 [4]:0 [5]:1 [6]:2 [7]:3 [8]:0 [9]:1 [10]:2 [11]:3 [12]:0 [13]:1 [14]:2 [15]:3 [16]:861104459 [17]:1414221919 [18]:81 [19]:0 [20]:1065353216 [21]:1065353216 [22]:1065353216 [23]:1065353216 [24]:1065353216 [25]:1065353216 [26]:1065353216 [27]:1065353216 [28]:1065353216 [29]:1065353216 [30]:1065353216 [31]:1065353216 [32]:1065353216 [33]:1065353216 [34]:1065353216 [35]:1065353216 [36]:4 [37]:1065353216 [38]:273 [39]:0
csrRowPtr:
[0]:0 [1]:4 [2]:8 [3]:12 [4]:16 [5]:0 [6]:33 [7]:0 [8]:40336064 [9]:0 [10]:40336496 [11]:0 [12]:0 [13]:0 [14]:49 [15]:0 [16]:40312448 [17]:0 [18]:1685221231 [19]:1952542313 [20]:1701978213 [21]:1730178145 [22]:1919250021 [23]:27745 [24]:1162162274 [25]:1768519237 [26]:81 [27]:0 [28]:0 [29]:0 [30]:0 [31]:0 [32]:1 [33]:1 [34]:1 [35]:1 [36]:2 [37]:2 [38]:2 [39]:2
csrVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:1 [6]:1 [7]:1 [8]:1 [9]:1 [10]:1 [11]:1 [12]:1 [13]:1 [14]:1 [15]:1 [16]:5.60519e-45 [17]:1 [18]:3.82554e-43 [19]:0 [20]:1.70078e-37 [21]:0 [22]:1.70092e-37 [23]:0 [24]:8.96831e-44 [25]:0 [26]:1.70057e-37 [27]:0 [28]:1.56318e-37 [29]:0 [30]:-7.03773e+13 [31]:4.56599e-41 [32]:-7.03773e+13 [33]:4.56599e-41 [34]:1.43493e-42 [35]:0 [36]:7.60905e-43 [37]:0 [38]:1.56322e-37 [39]:0
pretty print:
x x x x
x x x x
x x x x
x x x x
mxm: 0.044384 ms
denseVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:1 [6]:1 [7]:1 [8]:1 [9]:1 [10]:1 [11]:1 [12]:1 [13]:1 [14]:1 [15]:1
x x x x
x x x x
x x x x
x x x x
There were 0 errors out of 16.

Could you doublecheck you were on the latest master branch commit?

Hi Carel,
Thanks for the reply.
I mean that it can not read the "symmetric" mtx data rather than "general" data.
You can try to read this file

%%MatrixMarket matrix coordinate real symmetric
%
4 4 10
1 1 1
2 1 1
2 2 1
3 1 1
3 2 1
3 3 1
4 1 1
4 2 1
4 3 1
4 4 1

Thanks!
Yang

Hi @wyatuestc, thanks I confirmed that this is indeed a bug. It should be fixed in the new commit c147935. The reason is that as pointed out in this issue, the code will filter out the self-loops (i.e. elements on the diagonal). However, the bug caused it to fail to do so if the diagonal nonzero happened to be the 1 1 element (i.e. the first element). Now the code behaves as intended:

ctcyang@mario:~/merge-spmm/build$ bin/gspmm --debug=true --max_ncols=4 ../4sym_coo_dense.mtx
ta:    32
tb:    32
nt:    128
row:   1
debug: 1
%%MatrixMarket matrix coordinate real symmetric
4 4 12
csrColInd:
[0]:1 [1]:2 [2]:3 [3]:0 [4]:2 [5]:3 [6]:0 [7]:1 [8]:3 [9]:0 [10]:1 [11]:2 [12]:3 [13]:1065353216 [14]:65 [15]:0 [16]:1065353216 [17]:1065353216 [18]:1065353216 [19]:1065353216 [20]:1065353216 [21]:1065353216 [22]:1065353216 [23]:1065353216 [24]:1065353216 [25]:1065353216 [26]:1065353216 [27]:1065353216 [28]:7 [29]:1065353216 [30]:65 [31]:0 [32]:17620512 [33]:0 [34]:17483824 [35]:0 [36]:26246064 [37]:0 [38]:0 [39]:1543504138
csrRowPtr:
[0]:0 [1]:3 [2]:6 [3]:9 [4]:12 [5]:0 [6]:33 [7]:0 [8]:26245536 [9]:0 [10]:26245968 [11]:0 [12]:0 [13]:0 [14]:49 [15]:0 [16]:26222064 [17]:0 [18]:1685221231 [19]:1952542313 [20]:1701978213 [21]:1931504737 [22]:1701670265 [23]:1667854964 [24]:1162162176 [25]:1768519237 [26]:81 [27]:0 [28]:0 [29]:0 [30]:0 [31]:1 [32]:1 [33]:1 [34]:2 [35]:2 [36]:2 [37]:3 [38]:3 [39]:3
csrVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:1 [6]:1 [7]:1 [8]:1 [9]:1 [10]:1 [11]:1 [12]:9.80909e-45 [13]:1 [14]:9.10844e-44 [15]:0 [16]:2.58733e-38 [17]:0 [18]:2.54902e-38 [19]:0 [20]:5.30747e-38 [21]:0 [22]:0 [23]:1.4412e+17 [24]:7.03687e+13 [25]:4.56683e-41 [26]:2.93874e-39 [27]:0 [28]:0 [29]:0 [30]:2.03188e-43 [31]:0 [32]:4.37148e-38 [33]:0 [34]:0 [35]:0 [36]:7.03687e+13 [37]:4.56683e-41 [38]:2.93874e-39 [39]:0
pretty print:
0 x x x
x 0 x x
x x 0 x
x x x 0
mxm: 0.054528 ms
denseVal:
[0]:0 [1]:1 [2]:1 [3]:1 [4]:1 [5]:0 [6]:1 [7]:1 [8]:1 [9]:1 [10]:0 [11]:1 [12]:1 [13]:1 [14]:1 [15]:0
0 x x x
x 0 x x
x x 0 x
x x x 0
There were 0 errors out of 12.

As pointed out in the issue, if you don't want it to filter out diagonal elements, you will have to set to false

bool remove_self_loops=true ) {

Then the result will be the same as the general case:

ctcyang@mario:~/merge-spmm/build$ bin/gspmm --debug=true --max_ncols=4 ../4sym_coo_dense.mtx
ta:    32
tb:    32
nt:    128
row:   1
debug: 1
%%MatrixMarket matrix coordinate real symmetric
4 4 16
csrColInd:
[0]:0 [1]:1 [2]:2 [3]:3 [4]:0 [5]:1 [6]:2 [7]:3 [8]:0 [9]:1 [10]:2 [11]:3 [12]:0 [13]:1 [14]:2 [15]:3 [16]:861104459 [17]:1414221919 [18]:81 [19]:0 [20]:1065353216 [21]:1065353216 [22]:1065353216 [23]:1065353216 [24]:1065353216 [25]:1065353216 [26]:1065353216 [27]:1065353216 [28]:1065353216 [29]:1065353216 [30]:1065353216 [31]:1065353216 [32]:1065353216 [33]:1065353216 [34]:1065353216 [35]:1065353216 [36]:4 [37]:1065353216 [38]:273 [39]:0
csrRowPtr:
[0]:0 [1]:4 [2]:8 [3]:12 [4]:16 [5]:0 [6]:33 [7]:0 [8]:27720240 [9]:0 [10]:27720672 [11]:0 [12]:0 [13]:0 [14]:49 [15]:0 [16]:27696624 [17]:0 [18]:1685221231 [19]:1952542313 [20]:1701978213 [21]:1931504737 [22]:1701670265 [23]:1667854964 [24]:1162162176 [25]:1768519237 [26]:81 [27]:0 [28]:0 [29]:0 [30]:0 [31]:0 [32]:1 [33]:1 [34]:1 [35]:1 [36]:2 [37]:2 [38]:2 [39]:2
csrVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:1 [6]:1 [7]:1 [8]:1 [9]:1 [10]:1 [11]:1 [12]:1 [13]:1 [14]:1 [15]:1 [16]:5.60519e-45 [17]:1 [18]:3.82554e-43 [19]:0 [20]:6.13444e-38 [21]:0 [22]:6.13518e-38 [23]:0 [24]:8.96831e-44 [25]:0 [26]:6.13342e-38 [27]:0 [28]:5.44648e-38 [29]:0 [30]:4.00049 [31]:4.56151e-41 [32]:4.00049 [33]:4.56151e-41 [34]:1.43493e-42 [35]:0 [36]:7.60905e-43 [37]:0 [38]:5.44664e-38 [39]:0
pretty print:
x x x x
x x x x
x x x x
x x x x
mxm: 0.046368 ms
denseVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:1 [6]:1 [7]:1 [8]:1 [9]:1 [10]:1 [11]:1 [12]:1 [13]:1 [14]:1 [15]:1
x x x x
x x x x
x x x x
x x x x
There were 0 errors out of 16.

Thanks!
BTW, is it possible to execute these codes on newer architecture GPUs (turing/volta/pascal) ?

I've tested on Volta and Pascal and it works, but have not tested Turing. It should work for Turing in theory, or at least with minimal modification.

Thanks! I'm running gspmm on RTX2080 (Turing).
I found that the correctness of gspmm depends on the sharp of the matrix.
For example, it worked well on a 4x4 matrix but crashed on a 4x32 matrix.
I also execute gspmm on some square matrices, and I found it cannot work on some matrices (512x512, 1024x1024).
I'm not sure whether it is related to GPU arch or some corner cases in source codes.
Thanks!
Yang

  • 4x4 Correct
  • 4x32 Wrong
  • 16x16 Correct
  • 32x32 Correct
  • 64x64 Correct
  • 128x128 Correct
  • 256x256 Correct
  • 512x512 Wrong
  • 1024x1024 Wrong

command:

./bin/gspmm --debug=true --mode="mergepath" ./[matrix].mtx

data:
data.zip

Sorry for the slow response. Thank you so much for bringing this to my attention, Yang! I tried your datasets, and 4 x 32/1024 x 1024 are indeed wrong. I tested on Tesla V GPU with 12GB memory.

I tracked down the 4 x 32 error to an incorrect assumption in the test file gspmm.cu where I assumed square matrices. Therefore, only the first A.nrows of the dense B matrix was initialized correctly. This should be fixed in commit 100ddca. Please see the diff here: 100ddca

Need some more time to investigate 1024 x 1024 error.

Hi @YangWang92, thanks for pointing out the error. If you use the command: bin/gspmm --debug 1 --mode="mergepath" --nt=512 --iter=10 dataset/data/1024_1024coo_dense.mtx, it gives the correct solution.

Still investigating why this value for nt is the magic number. I suspect it has to do with how the number of blocks is calculated.