Error handling in worker thread

Question

Error handling in worker thread

MikeGitb opened this issue 7 years ago · comments

I finally have some time and thought about tackeling the question of what should happen when an error occures in the worker thread (currently the thread will simply stop). But before I create a PR I wanted to discuss my ideas first:

First of all I think we need to distinguish between faults from which the thread should try to recover (e.g. crc errors) and error from which recovery is infeasible (device got disconnected).
In case of recoverable errors I'd just try to find the next valid packet and give up after a few iterations.
In case of non-recoverable errors, the loop would exit and the error would be propagated through the queue to sweep_device_get_scan.

The question is what the user or the library is supposed to to to bring the system back into a defined state (we could for example automatically call sweep_device_stop_scanning or require this from the user)

David Young · Answer 1 · Tue May 16 2017 01:13:06 GMT+0800 (China Standard Time)

In case of recoverable errors I'd just try to find the next valid packet and give up after a few iteration

Currently libsweep does not perform any kind of intelligent parsing. Instead it relies on the constant assumption that the data-block bytes arrive in order and without interruption until the stream is stopped. Technically, because each byte can take any form, it isn't possible to guarantee that you can find the next valid packet after an error. Consider the case of a missing byte, where the first byte of the proceeding data block is incorrectly included in the interpretation of the last byte of the current data block. When trying to find the next valid data block, you could slide a window (7 bytes long) across the incoming stream, checking if the values make sense.... but it wouldn't be a guarantee. Granted if the following check out, then it is very unlikely due to random chance:

validating the checksum
validating angle falls between [0:360]

I'm thinking of all the possible non-recoverable errors that would halt the process. In every case it seems necessary to stop_scanning and potentially even reset. I don't think we should require that the user handle stopping the scanning process.

I'm also wondering if it would be worth it to try and reset/restart the scanning automatically such that the user just sees a large gap in data while the sensor resets and starts scanning again. But this might require we keep track of state info (like sample rate) which the device forgets across power cycles. For now it seems better to just shutdown and report gracefully. Forcing graceful shutdown should also help us track down the root of the errors that occur as users report them.

MikeGitb · Answer 2 · Wed May 16 2018 18:55:24 GMT+0800 (China Standard Time)

If I understand correctly, this project is being shut down so I'm closing this issue.