Flagged data can result in poor performance

Question

Flagged data can result in poor performance

bkloppenborg opened this issue 12 years ago · comments

Brian Kloppenborg commented 12 years ago

Flagging data in OIFITS files can result in poor performance for certain values of flagged points. Here are some test results on a MIRC 6T data file (8 channels, 15 rows). From the test results below it is clear this is a problem size division and/or memory access pattern issue.

This data set 2011Nov03-epsAur-avg5.oifits consitsts of
0 Vis
720 V2
960 T3
2860 UV Points

We flagged only V2 records.

Flagging N values inside of a MIRC (8-channel) data file results in the following performance:

0: good (no values flagged)
1: good
2: bad
3: bad
4: good
5: good
6: bad
7: good
8: good (entire row flagged)

Flagging N values between rows of a MIRC (8-channel) data file results in the following performance:

0: good (no values flagged)
1: good
2: bad
3: bad
4: good
5: good
6: bad
7: bad
8: good
9: good
10: good
11: bad
12: good
13: good
14: good
15: good

Flagging one entire column (15) plus an extra element in each row to total N:

16: good
17: bad
18: good
19: good
20: good
21: good
22: bad
23: good
24: good
25: good
26: good
28: good
29: bad
30: good

Brian Kloppenborg · Answer 1 · Wed Dec 05 2012 22:57:47 GMT+0800 (China Standard Time)

Flagging across tables also appears to degrade performance. Points to a similar underlying problem.

A possible solution to this problem is to increase the problem size by padding the input data buffers. This will require that input that will not affect/invalidate the computed results be inserted into the buffers.

Try locating the kernel which is causing the greatest slowdown and pad out buffers there.

Brian Kloppenborg · Answer 2 · Wed Dec 05 2012 23:39:36 GMT+0800 (China Standard Time)

Closed in 281cde4. Performance is better after rounding the buffer size up to the next highest integer multiple of the underlying hardware's number of processors per multiprocessor.

Note, at present we've hardcoded this to 16. This won't work so well for ATI.