bkloppenborg / liboi

OpenCL Interferometry Library

Home Page:https://github.com/bkloppenborg/liboi/wiki

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Flagged data can result in poor performance

bkloppenborg opened this issue · comments

Flagging data in OIFITS files can result in poor performance for certain values of flagged points. Here are some test results on a MIRC 6T data file (8 channels, 15 rows). From the test results below it is clear this is a problem size division and/or memory access pattern issue.

This data set 2011Nov03-epsAur-avg5.oifits consitsts of
0 Vis
720 V2
960 T3
2860 UV Points

We flagged only V2 records.

Flagging N values inside of a MIRC (8-channel) data file results in the following performance:

  • 0: good (no values flagged)
  • 1: good
  • 2: bad
  • 3: bad
  • 4: good
  • 5: good
  • 6: bad
  • 7: good
  • 8: good (entire row flagged)

Flagging N values between rows of a MIRC (8-channel) data file results in the following performance:

  • 0: good (no values flagged)
  • 1: good
  • 2: bad
  • 3: bad
  • 4: good
  • 5: good
  • 6: bad
  • 7: bad
  • 8: good
  • 9: good
  • 10: good
  • 11: bad
  • 12: good
  • 13: good
  • 14: good
  • 15: good

Flagging one entire column (15) plus an extra element in each row to total N:

  • 16: good
  • 17: bad
  • 18: good
  • 19: good
  • 20: good
  • 21: good
  • 22: bad
  • 23: good
  • 24: good
  • 25: good
  • 26: good
  • 28: good
  • 29: bad
  • 30: good

Flagging across tables also appears to degrade performance. Points to a similar underlying problem.

A possible solution to this problem is to increase the problem size by padding the input data buffers. This will require that input that will not affect/invalidate the computed results be inserted into the buffers.

Try locating the kernel which is causing the greatest slowdown and pad out buffers there.

Closed in 281cde4. Performance is better after rounding the buffer size up to the next highest integer multiple of the underlying hardware's number of processors per multiprocessor.

Note, at present we've hardcoded this to 16. This won't work so well for ATI.