benbjohnson / litestream

Streaming replication for SQLite.

Home Page:https://litestream.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[0.4.x] S3 replication gives up after getting an error from the API

hifi opened this issue · comments

I'm not sure yet why this happens but I'll write it up for reference:

/database/hungryserv.db(s3): wal segment written: 419b7da594d6e71d/000000000000138d:000000000006da58 elapsed=4.205663822s bytes=37080 speed=70.53Kbps
/database/hungryserv.db(s3): wal segment written: 419b7da594d6e71d/000000000000138e:0000000000000000 elapsed=1.964268798s bytes=700432 speed=2.85Mbps
/database/hungryserv.db(s3): wal segment written: 419b7da594d6e71d/000000000000138e:00000000000ab010 elapsed=230.279761ms bytes=28840 speed=1.00Mbps
/database/hungryserv.db: checkpoint(PASSIVE): [0,66,66]
/database/hungryserv.db: remove shadow index: 419b7da594d6e71d/000000000000136f
/database/hungryserv.db(s3): wal segment written: 419b7da594d6e71d/000000000000138f:0000000000000000 elapsed=3.022565642s bytes=243112 speed=643.46Kbps
/database/hungryserv.db: checkpoint(PASSIVE): [0,41,41]
/database/hungryserv.db: remove shadow index: 419b7da594d6e71d/0000000000001370
/database/hungryserv.db(s3): monitor error: write index segments: index=5007 err=InternalError: An internal error occurred.  Please retry your upload.
        status code: 500, request id: XXXXXXXXXXXXXXXX, host id: YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
/database/hungryserv.db: checkpoint(PASSIVE): [0,123,123]
/database/hungryserv.db(s3): monitor error: compare pos: generation mismatch
/database/hungryserv.db(s3): monitor error: compare pos: generation mismatch
/database/hungryserv.db: checkpoint(PASSIVE): [0,24,24]
/database/hungryserv.db(s3): monitor error: compare pos: generation mismatch
/database/hungryserv.db: checkpoint(PASSIVE): [0,57,57]
/database/hungryserv.db(s3): monitor error: compare pos: generation mismatch
/database/hungryserv.db: checkpoint(PASSIVE): [0,29,29]
/database/hungryserv.db(s3): monitor error: compare pos: generation mismatch
/database/hungryserv.db: checkpoint(PASSIVE): [0,85,85]
/database/hungryserv.db(s3): monitor error: compare pos: generation mismatch
...

We're carrying a bunch of downstream patches so I can't say if this is something I've caused but I also want to drop all of the downstream patches before settling in in production.

This is on current master + a bunch of PRs that are open.

Sending it the kill signal makes it recover during shutdown when it runs the final sync:

/database/hungryserv.db: checkpoint(PASSIVE): [0,3,3]
/database/hungryserv.db(s3): monitor error: compare pos: generation mismatch
/database/hungryserv.db: checkpoint(PASSIVE): [0,3,3]
signal received, litestream shutting down
/database/hungryserv.db(s3): wal segment written: 419b7da594d6e71d/000000000000138f:000000000003b5a8 elapsed=137.778249ms bytes=28840 speed=1.67Mbps
/database/hungryserv.db(s3): wal segment written: 419b7da594d6e71d/0000000000001390:0000000000000000 elapsed=318.501965ms bytes=168952 speed=4.24Mbps
/database/hungryserv.db(s3): wal segment written: 419b7da594d6e71d/0000000000001391:0000000000000000 elapsed=848.70097ms bytes=506792 speed=4.78Mbps
...

Note that the failing index was 5007 (138f) which is correctly uploaded this time.

This can be easily reproduced by having a local S3 server running (like https://s3ninja.net/) and restarting the container during WAL upload, looking into it.

Tested this is not an issue with v0.3.9, only affects current main.