reverbrain / eblob

Eblob is an append-only low-level IO library, which saves data in blob files. Created as low-level backend for elliptics

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

eblob_want_defrag: FAILED

ru-di opened this issue · comments

Hello. I try to use eblob with auto defrag (with short defrag_timeout and defrag_percentage = 25%). Here it is a logs from console:
$ eblob_index_info data-0.0.index
Total records: 13187
Removed records: 13187
And from running programm:
eblob_want_defrag: FAILED: trying to remove non empty blob: removed: 13187, total: 13187index_offset: 0, removed_size: 1265952
eblob_want_defrag: index: 0, removed: 13187, total: 13187, percentage: 25, size: 201268730, want-defrag: 1
And such thing is happening with all other filled blobs.
Am i do something wrong?

What is eblob version?

If it is the latest version to date, I have another questions:
Does it really remove given blob? Or it just writes into log?
Does this happen when you manually run dnet_ioclient -r or when automatic eblob defragmentation fires in?
Is data-0.0 blob the latest blob. i.e. is there any other blob like data-0.1?
What does data.stat says in this case?

I think it's the same issue at @shaitan was seeing: #81:

Defrag is over cautious and makes many checks before actually removing blob from disk. One of those checks is currently broken for sorted blobs consisting only from removed records.
If you'd pull Kirill changes it'll be broken for non-sorted blobs that consist only from removed records.

Quick fix is in 9564b18 to check for both:
(bctl->index_size != removed_size && bctl->index_offset != removed_size)

Proper fix is to refactor code by unifying index_size (metric for sorted blobs) and index_offset (metric for unsorted ones) into one.

@bioothod, eblob version is 0.21.41. It is remove nothing. The allocated space for storage is full and i can't write there (error 28). There are a several eblobs and all records into them was removed.
blob: start
bctl: index: 6: using existing sorted index: size: 1268736, num: 13216
bctl: index: 0: using existing sorted index: size: 1265952, num: 13187
bctl: index: 1: using existing sorted index: size: 1264896, num: 13176
bctl: index: 9: using existing sorted index: size: 1267584, num: 13204
bctl: index: 5: using existing sorted index: size: 1262976, num: 13156
bctl: index: 8: using existing sorted index: size: 1267296, num: 13201
bctl: index: 10: using existing sorted index: size: 1269216, num: 13221
...
Such thing i described earlier happens with all of them when i use automatic eblob defragmentation with config parameters defrag_timeout = 10, defrag_percentage = 25
Also, data.stat says strange thing that some eblobs have not removed records.
In contrast with data.stat, console log says, that all records was removed.

Does applying #81 help?

#81 helped, but only on start of eblob (i use only eblob, not elliptics). Now it deletes all empty blobs (with removed files) at the begining of work. When the storage overfloats again while working, it removes nothing, again eblob_want_defrag: FAILED. Console output. I use @shaitan 's build.

Does this happen for automatic defragmentation only? Does this error happen when you start defragmentation manually?

Please check current eblob tree - I've merged index/offset changes made by @shaitan

@bioothod , i tried current version. Defragmentation works good at the start of working of storage. It deletes blob with removed records very fast. When i try to start defragmentation manually, it works, but very slowly. I tried to start it, when my storage was full. By this time it usualy has more than 90% removed records. I attach output logs of my programm (a little description at the beggining of logs).
Programm output of working without eblob_start_defrag call.
Also, i found another thing: when the process restarts after unexpected end of working, an error 29 appear on writing. And the next restart solves this problem, even if it is either after unexpected ending. I also attach logs.
Do you think it is possible to fix (i mean defragmentation first of all)?

The latter seems like eblob is trying to fix its bad state after unexpected death.

Defragmentation is not a fast process - it copies your blobs twice and that sucks up the whole drive speed. Since this process is 100% disk IO bound we can not really control its performance, so, yes, it can be very slow sometimes.

At the beginning eblob doesn't deframent your data, it only sorts indexes (or read them, if they are sorted). Eblob can do this, but by default it doesn't. This can be tuned via blob config flags.

Thank you for clarifications. About error -29 cure: I have not understood how it works yet, but second restart helps to fix this error.
As for eblob flags - I was inattentive and didn't look blob.h thoroughly. I have used only docs on reverbrain.com until recently. I missed such important flag EBLOB_TIMED_DEFRAG. Why does this flag not listed on this page in docs? Now timed defrag works fine. It activates every defrag_timeout seconds.
But there is another problem or lack of understanding. How does configuration parameter defrag_percentage works? It is used in function eblob_want_defrag in conditions. If it is triggered, function eblob_want_defrag returns EBLOB_DEFRAG_NEEDED. I know how works EBLOB_REMOVE_NEEDED, but what does EBLOB_DEFRAG_NEEDED do? It exists only a couple of times in sources, one of them in function eblob_want_defrag and another in header.
Will it good to discuss about it here?

Sorry for long reply. Defrag percentage says that defragmentation should be performed when total size of all removed in given blob is more than defrag_percentage of total blob size.

Are there any issues with this report?

No, thank you.