leo-project / leofs

The LeoFS Storage System

Home Page:https://leo-project.net/leofs/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Storage compaction fails with garbage_too_long

sandervesik opened this issue · comments

LeoFS storage compaction fails and no space is freed up. The data files are over 1 TB but contain around 45G of actual data. Is there anything that can be done or should the cluster be redeployed one node at a time to recover? Also, could you please document the maximum amount of garbage that compaction can handle?

From the info log :

[I] storage_2@leofs-03.infraci.ptec 2019-07-30 10:37:23.793275 +0000 1564483043 null:null 0 {module,"leo_compact_fsm_worker"},{function,"running/2"},{line,393},{body,{leo_compact_worker_2,{garbage_too_long,[{leo_compact_fsm_worker,execute_1,3,[{file,"src/leo_compact_fsm_worker.erl"},{line,900}]},{leo_compact_fsm_worker,running,2,[{file,"src/leo_compact_fsm_worker.erl"},{line,362}]},{gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,451}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}}}
[I] storage_2@leofs-03.infraci.ptec 2019-07-30 10:37:23.913591 +0000 1564483043 null:null 0 {module,"leo_compact_fsm_worker"},{function,"gen_compaction_report/1"},{line,1253},{body,[[{file_path,"/opt/local/leofs/storage/data/object/2.avs_63731700311"},{avs_ver,<<"LeoFS AVS-2.4">>},{num_of_active_objs,13946},{size_of_active_objs,47951505407},{total_num_of_objs,13946},{total_size_of_objs,47951505407},{start_datetime,"2019-07-30 10:05:11 +0000"},{end_datetime,"2019-07-30 10:37:23 +0000"},{errors,[]},{duration,1932},{result,fail}]]}
[I] storage_2@leofs-03.infraci.ptec 2019-07-30 10:37:23.913721 +0000 1564483043 null:null 0 {module,"leo_compact_fsm_controller"},{function,"running/2"},{line,495},{body,"FINISHED Compaction|Diagnosis|Recovery"}

I've checked your LeoFS' data-compaction situation. garbage_too_long means an AVS file of your LeoFS' is that part of the file is corrupted. Also, which means that there is a 10 MB or more file corruption. The default value of force quit in bytes is 10 MB.

To recover this problem, first I'd like to ask you to question as below:

  • The result of $ leofs-adm status
  • Other nodes of data-compaction failure.