reverbrain / eblob

Eblob is an append-only low-level IO library, which saves data in blob files. Created as low-level backend for elliptics

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Eblob: lots of errors after updating version from 0.22.16 to 0.23.0

rudneff opened this issue · comments

After updating eblob library i got some errors. I think they linked together.
But before that one question, please: what means -1 in this message:

2015-07-28 19:11:33.291628 2: blob: start
2015-07-28 19:11:33.292360 2: bctl: index: 2/-1, using unsorted index: size: 146208, num: 1523, data: size: 30878861, max blob size: 200000000

In version 0.22.23 this message appeared after force restart of my application, then eblob_init finished with SIGABRT every time i tried to restart. Now i can't reproduce this terrible bug.

Error while starting application every second start (errno 29)

This appears every second start. Write thread is blocked with errno 29. Read thread works normally. For working properly restart required.

2015-07-28 18:36:39.683964 1: blob: c66800000000: i18: eblob_fill_write_control_from_ram: ERROR-pread-index: position: 54998906, offset: 0, size: 23254, flags: 0x100 [chunked_csum], total data size: 0, disk-size: 0, data_fd: 11, index_fd: 12, bctl: 0x65e820: -29
2015-07-28 18:36:39.684032 1: blob: c66800000000: i18: eblob_writev: finished: position: 54998906, offset: 0, size: 23254, flags: 0x100 [chunked_csum], total data size: 0, disk-size: 0, data_fd: 11, index_fd: 12, bctl: (nil): -29
Write thread 140737219917568: going sleep!
2015-07-28 18:36:39.716113 1: blob: c66800000000: i18: eblob_fill_write_control_from_ram: ERROR-pread-index: position: 54998906, offset: 0, size: 23254, flags: 0x100 [chunked_csum], total data size: 0, disk-size: 0, data_fd: 11, index_fd: 12, bctl: 0x65e820: -29
2015-07-28 18:36:39.716170 1: blob: c66800000000: i18: eblob_writev: finished: position: 54998906, offset: 0, size: 23254, flags: 0x100 [chunked_csum], total data size: 0, disk-size: 0, data_fd: 11, index_fd: 12, bctl: (nil): -29
Write thread 140737203132160: going sleep!
Stat: W: done: 0.000100 recs: 0 wps: 0 err: 2 R: done: 0.043850 reads: 877 rps: 877 err: 0
Stat: W: done: 0.000100 recs: 0 wps: 0 err: 2 R: done: 0.043850 reads: 877 rps: 0 err: 0
Stat: W: done: 0.000100 recs: 0 wps: 0 err: 2 R: done: 0.043850 reads: 877 rps: 0 err: 0
2015-07-28 18:36:42.684256 1: blob: c66800000000: i18: eblob_fill_write_control_from_ram: ERROR-pread-index: position: 54998906, offset: 0, size: 15040, flags: 0x100 [chunked_csum], total data size: 0, disk-size: 0, data_fd: 11, index_fd: 12, bctl: 0x65e820: -29
2015-07-28 18:36:42.684327 1: blob: c66800000000: i18: eblob_writev: finished: position: 54998906, offset: 0, size: 15040, flags: 0x100 [chunked_csum], total data size: 0, disk-size: 0, data_fd: 11, index_fd: 12, bctl: (nil): -29
Write thread 140737219917568: going sleep!
2015-07-28 18:36:42.716380 1: blob: c66800000000: i18: eblob_fill_write_control_from_ram: ERROR-pread-index: position: 54998906, offset: 0, size: 17873, flags: 0x100 [chunked_csum], total data size: 0, disk-size: 0, data_fd: 11, index_fd: 12, bctl: 0x65e820: -29
2015-07-28 18:36:42.716451 1: blob: c66800000000: i18: eblob_writev: finished: position: 54998906, offset: 0, size: 17873, flags: 0x100 [chunked_csum], total data size: 0, disk-size: 0, data_fd: 11, index_fd: 12, bctl: (nil): -29
Write thread 140737203132160: going sleep!

Thank you for your supporting.

bctl: index: 2/-1 shows current and maximum index known to eblob. When eblob starts it doesn't know how many blob files (and indexes) are available, it has to enumerate them, and while doing this enumeration eblob sets maximum known index to -1, this allows to determine maximum real index (it is always bigger than -1).

2015-07-28 18:36:39.683964 1: blob: c66800000000: i18: eblob_fill_write_control_from_ram: ERROR-pread-index: position: 54998906, offset: 0, size: 23254, flags: 0x100 [chunked_csum], total data size: 0, disk-size: 0, data_fd: 11, index_fd: 12, bctl: 0x65e820: -29

-29 error usually means data corruption - eblob tries to read outside of the index
Try running eblob_merge tool on your blobs and check again whether this issue persists

I merged two blobs into one before presumably "bad" start (with error 29). Result:

Completed input stream /tmp/test_queue/data-0.36: total: 2, rest: 2
Completed all blobs
Total records: 17972
Written records: 5627
Removed records: 12345
Broken records: 0

After merge everything works good. But erarlier (0.21.16) such problems doesn't exist. I mean merge was not needed, whether i finished program correctly or not.

Crash should happen only once to become visible, probably you were lucky not to corrupt data with previous crashes. For example they might happen after sync timeout (you can specify rather small timeout or even zero, but it heavily affects performance).

@bioothod and merge tool fix broken blobs?

@agend eblob_merge tries to fix broken blobs, it iterates over blob and indexes and skips broken and removed entries, all good records are being copied into destination blob.
It is possible to run over multiple input blobs and produce one output blob.

That os strange as we never had this issue before (i think from the beginning of our usage - couple years) and there we have reproducible scenario. May be you consider to take a more deeper look into our case? We can also run test over previous version of eblob.

On 29 июля 2015 г., at 1:30, Evgeniy Polyakov notifications@github.com wrote:

@agend eblob_merge tries to fix broken blobs, it iterates over blob and indexes and skips broken and removed entries, all good records are being copied into destination blob.
It is possible to run over multiple input blobs and produce one output blob.


Reply to this email directly or view it on GitHub.

If it is easily reproducible, please show us backtrace (with debug package installed) after sigabort

I think we have found issue and it's on our side. @rudneff Please close it