raimohanska / ourboard

An online whiteboard

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Serial skips in server data

raimohanska opened this issue · comments

At least one board has a skipped event. I made the "reboot snapshot" operation non-strict on serials to be able to recover (6ac3e76), but the issues remain

  1. There is still at least one board with a hole in the history (maybe the server should patch as best as it can and then discard the history?)
  2. The reason for the missing event is unknown

The missing event was in the middle of a stored bundle. The quickCompactBoardHistory algorithm does check the serials at bundle boundary so it's unlikely this is a result of a compaction error. Looks more like a situation where the internal board state has been updated to the next serial, but the event was skipped in saving. However, I did not find any holes in how the next serial is assigned nor how the event bundles are saved. Even in the case of a DB storage failure, I cannot see how an event could be skipped.

Here are server log snippets.

The previous save 1408

Feb 15 00:35:39 r-board app/web.1 Saving board at serial 1408 with 3 new events

Board purged and events up to 1408 compacted

Feb 15 00:36:39 r-board app/web.1 Purging board from memory
Feb 15 00:36:39 r-board app/web.1 Compacting 24 bundles into one for board , containing serials 1...1408

Loading board again

Feb 15 00:36:50 r-board app/web.1 Loading board into memory
Feb 15 00:36:52 r-board app/web.1 Loaded board at serial 1408 from snapshot at serial 1259 and 149 events after snapshot. Took 2370ms
Feb 15 00:36:52 r-board app/web.1 Board loaded into memory:
Feb 15 00:36:52 r-board app/web.1 Loading board history for board session at serial 1408
Feb 15 00:36:52 r-board app/web.1 Got board history for board session at serial 1408

Here's log from the save.

Note that 1409 is the soon missing event. It was saved.

Feb 15 00:37:25 r-board app/web.1 Saving board at serial 1409 with 1 new events
Feb 15 00:38:20 r-board app/web.1 Statistics: active boards 1, sessions 3
Feb 15 00:38:41 r-board app/web.1 Saving board at serial 1410 with 1 new events

Board is purged, the bundles are compacted

Feb 15 01:51:15 r-board app/web.1 Saving board at serial 6919 with 8 new events
Feb 15 01:51:16 r-board app/web.1 Saving board at serial 6924 with 5 new events
Feb 15 01:51:19 r-board app/web.1 Saving board at serial 6927 with 3 new events
Feb 15 01:51:20 r-board app/web.1 Saving board at serial 6936 with 9 new events
Feb 15 01:52:35 r-board app/web.1 Loading board history for board session at serial 6936
Feb 15 01:52:35 r-board app/web.1 Got board history for board session at serial 6936
Feb 15 01:56:15 r-board app/web.1 Purging board from memory
Feb 15 01:56:15 r-board app/web.1 Compacting 524 bundles into one for board , containing serials 1...4008
Feb 15 01:56:15 r-board app/web.1 Compacting 700 bundles into one for board , containing serials 4009...6936

Next time the loading fails!

Feb 15 01:56:30 r-board app/web.1 Loading board into memory
Feb 15 01:56:32 r-board app/web.1 Error: Serial skip on item.front, 1408 -> 1410 (firstSerial undefined serial 1410)
Feb 15 01:56:32 r-board app/web.1 at Object.boardReducer (/app/backend/dist/index.js:4896:23)
Feb 15 01:56:32 r-board app/web.1 Serial skip on item.front, 1408 -> 1410 (firstSerial undefined serial 1410)
Feb 15 01:56:32 r-board app/web.1 Error applying board history for snapshot update for board . Loop index 151. Rebooting snapshot...
Feb 15 01:56:32 r-board app/web.1 Unable to reboot snapshot, failing at loop index 1. Giving up.
Feb 15 01:56:32 r-board app/web.1 Board load failed for board . Running compact/fix.
Feb 15 01:56:32 r-board app/web.1 Board : Verified 2 bundles containing 6936 events => no need to compact

Conclusion

Board was successfully loaded at 1408 and after that the event 1409 was saved. After compaction of 1..4008 board was not loaded successfully, one event was missing. At the time of compaction there were bundles like this:

B1 1..1408
B2 1409
B3 1410
B4 ...

The result was bundle 1..4008 which apparently now was missing event 1409.

The number of bundles "524" in the compaction log looks correct - this includes the bundle 1..1408 and the following bundles up to 4008. (I counted saves on the log and came up with 524).

So it seems lot like compaction was at fault. There are no errors or anything suspicious in the logs around the key events.

Added an extra defensive check f673a86

HA! The event is there but it's of type board.setAccessPolicy - nothing is missing. This event should indeed be ignored.