containerd / overlaybd

Overlaybd: a block based remote image format. The storage backend of containerd/accelerated-container-image.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Failure loop in `pread:checksum failed {offset: 0, length: 1106} (expected 0 but got 2168007317)`

shuaichang opened this issue · comments

What happened in your environment?

Hi we've got some random application crashes due to overlaybd failures, the following failures are in infinite loop and printing 2023/09/15 08:42:28|ERROR|th=00007F00758B2B80|/src/src/overlaybd/zfile/zfile.cpp:460|pread:checksum failed {offset: 0, length: 1106} (expected 0 but got 2168007317), reload result: 0

My understanding is that if there are dirty data chunks do not match CRC32, then overlaybd should evict that chunk and go back to origin, is this the correct expectation?

Could you help take a look at what the issue might be? Thanks

2023/09/15 08:42:22|INFO |th=00007F003C6DBB00|/src/src/switch_file.cpp:50|try_open_zfile:open file as zfile format, path: https://{registry-host}/v2/{registry-repo}/blobs/sha256:ad9a26e9a5e9ad68233fecea9d319024a3ec7ae667d788e5185868dbd0796a8b
2023/09/15 08:42:22|INFO |th=00007F003BED5B00|/src/src/overlaybd/tar/tar_file.cpp:340|open_tar_file:open file as tar file
2023/09/15 08:42:22|INFO |th=00007F003BED5B00|/src/src/overlaybd/tar/tar_file.cpp:97|read_header:[m_size=235858527][base_offset=1536]
2023/09/15 08:42:22|INFO |th=00007F003BED5B00|/src/src/overlaybd/zfile/zfile.cpp:711|load_jump_table:read overwrite header. idx_offset: 235588323, idx_bytes: 269692, dict_size: 0, use_dict: 0
2023/09/15 08:42:22|INFO |th=00007F003B6D1740|/src/src/overlaybd/tar/tar_file.cpp:340|open_tar_file:open file as tar file
2023/09/15 08:42:22|INFO |th=00007F003B6D1740|/src/src/overlaybd/tar/tar_file.cpp:97|read_header:[m_size=618182859][base_offset=1536]
2023/09/15 08:42:22|INFO |th=00007F003B6D1740|/src/src/overlaybd/zfile/zfile.cpp:711|load_jump_table:read overwrite header. idx_offset: 616979179, idx_bytes: 1203168, dict_size: 0, use_dict: 0
2023/09/15 08:42:22|INFO |th=00007F003DEEAAC0|/src/src/overlaybd/zfile/zfile.cpp:207|build:create jump table done. {part_count: 1337, deltas_count: 21382, size: 53460}
2023/09/15 08:42:22|INFO |th=00007F003DEEAAC0|/src/src/overlaybd/zfile/compressor.cpp:352|create_compressor:ZFileObject using LZ4 algorithm
2023/09/15 08:42:22|INFO |th=00007F003DEEAAC0|/src/src/switch_file.cpp:50|try_open_zfile:open file as zfile format, path: https://{registry-host}/v2/{registry-repo}/blobs/sha256:412d7666d7dab34c33a6a06dfe584a14eb5a469ab165bcc218c4912d3e128da2
2023/09/15 08:42:22|INFO |th=00007F003BED5B00|/src/src/overlaybd/zfile/zfile.cpp:207|build:create jump table done. {part_count: 4214, deltas_count: 67424, size: 168560}
2023/09/15 08:42:22|INFO |th=00007F003BED5B00|/src/src/overlaybd/zfile/compressor.cpp:352|create_compressor:ZFileObject using LZ4 algorithm
2023/09/15 08:42:22|INFO |th=00007F003BED5B00|/src/src/switch_file.cpp:50|try_open_zfile:open file as zfile format, path: https://{registry-host}/v2/{registry-repo}/blobs/sha256:915418a1d7bba0d8a2430b10923109da6fbddecfb6f1ae777caf0149033d57c5
2023/09/15 08:42:22|INFO |th=00007F003B6D1740|/src/src/overlaybd/zfile/zfile.cpp:207|build:create jump table done. {part_count: 18800, deltas_count: 300793, size: 751986}
2023/09/15 08:42:22|INFO |th=00007F003B6D1740|/src/src/overlaybd/zfile/compressor.cpp:352|create_compressor:ZFileObject using LZ4 algorithm
2023/09/15 08:42:22|INFO |th=00007F003B6D1740|/src/src/switch_file.cpp:50|try_open_zfile:open file as zfile format, path: https://{registry-host}/v2/{registry-repo}/blobs/sha256:12fc100968d65f863fd282ae2e7d4522bd1b5feb4c2ff5fbfd9dbc7ea3670d4f
2023/09/15 08:42:22|INFO |th=00007F003DEF0740|/src/src/overlaybd/lsmt/file.cpp:1531|do_parallel_load_index:check 0-th file is normal file or LSMT file
2023/09/15 08:42:22|INFO |th=00007F003D6ED740|/src/src/overlaybd/lsmt/file.cpp:1531|do_parallel_load_index:check 1-th file is normal file or LSMT file
2023/09/15 08:42:22|INFO |th=00007F003CEE8B40|/src/src/overlaybd/lsmt/file.cpp:1531|do_parallel_load_index:check 2-th file is normal file or LSMT file
2023/09/15 08:42:22|INFO |th=00007F003C6DFEC0|/src/src/overlaybd/lsmt/file.cpp:1531|do_parallel_load_index:check 3-th file is normal file or LSMT file
2023/09/15 08:42:22|INFO |th=00007F003BEDCB80|/src/src/overlaybd/lsmt/file.cpp:1531|do_parallel_load_index:check 4-th file is normal file or LSMT file
2023/09/15 08:42:22|INFO |th=00007F003B6D2680|/src/src/overlaybd/lsmt/file.cpp:1531|do_parallel_load_index:check 5-th file is normal file or LSMT file
2023/09/15 08:42:22|INFO |th=00007F003DEF0740|/src/src/overlaybd/lsmt/file.cpp:1563|do_parallel_load_index:load index from 0-th file done
2023/09/15 08:42:22|INFO |th=00007F003D6ED740|/src/src/overlaybd/lsmt/file.cpp:1563|do_parallel_load_index:load index from 1-th file done
2023/09/15 08:42:22|INFO |th=00007F003CEE8B40|/src/src/overlaybd/lsmt/file.cpp:1563|do_parallel_load_index:load index from 2-th file done
2023/09/15 08:42:22|INFO |th=00007F003C6DFEC0|/src/src/overlaybd/lsmt/file.cpp:1563|do_parallel_load_index:load index from 3-th file done
2023/09/15 08:42:22|INFO |th=00007F003BEDCB80|/src/src/overlaybd/lsmt/file.cpp:1563|do_parallel_load_index:load index from 4-th file done
2023/09/15 08:42:22|INFO |th=00007F003B6D2680|/src/src/overlaybd/lsmt/file.cpp:1563|do_parallel_load_index:load index from 5-th file done
2023/09/15 08:42:22|INFO |th=00007F007A9D9B00|/src/src/image_file.cpp:354|open_lowers:LSMT::open_files_ro(files, 6) success
2023/09/15 08:42:22|INFO |th=00007F007A9D9B00|/src/src/image_file.cpp:477|init_image_file:RW layer path not set. return RO layers.
2023/09/15 08:42:22|INFO |th=00007F007A9D9B00|/src/src/image_file.h:52|ImageFile:new imageFile, bs: 512, size: 68719476736
2023/09/15 08:42:22|INFO |th=00007F007A9D9B00|/src/src/main.cpp:367|dev_open:dev opened /local_disk0/overlaybd-snapshots/snapshots/103/block/config.v1.json, time cost 213 ms
2023/09/15 08:42:22|ERROR|th=00007F0039EAD300|/src/build/_deps/tcmu-src/scsi.cpp:505|tcmu_emulate_evpd_inquiry:[dev dev_103] Vital product data page code 201 not support
2023/09/15 08:42:22|ERROR|th=00007F00788D0EC0|/src/build/_deps/tcmu-src/scsi.cpp:505|tcmu_emulate_evpd_inquiry:[dev dev_7] Vital product data page code 201 not support
2023/09/15 08:42:22|ERROR|th=00007F00507C82C0|/src/build/_deps/tcmu-src/scsi.cpp:505|tcmu_emulate_evpd_inquiry:[dev dev_19] Vital product data page code 201 not support
2023/09/15 08:42:22|ERROR|th=00007F004E3B7340|/src/build/_deps/tcmu-src/scsi.cpp:505|tcmu_emulate_evpd_inquiry:[dev dev_81] Vital product data page code 201 not support
2023/09/15 08:42:22|ERROR|th=00007F004B797E80|/src/build/_deps/tcmu-src/scsi.cpp:505|tcmu_emulate_evpd_inquiry:[dev dev_88] Vital product data page code 201 not support
2023/09/15 08:42:22|ERROR|th=00007F0040B0D700|/src/build/_deps/tcmu-src/scsi.cpp:505|tcmu_emulate_evpd_inquiry:[dev dev_97] Vital product data page code 201 not support
2023/09/15 08:42:28|ERROR|th=00007F00758B2B80|/src/src/overlaybd/zfile/zfile.cpp:460|pread:checksum failed {offset: 0, length: 1106} (expected 0 but got 2168007317), reload result: 0
2023/09/15 08:42:28|ERROR|th=00007F00758B2B80|/src/src/overlaybd/zfile/zfile.cpp:460|pread:checksum failed {offset: 0, length: 1106} (expected 0 but got 2168007317), reload result: 0
2023/09/15 08:42:28|ERROR|th=00007F00758B2B80|/src/src/overlaybd/zfile/zfile.cpp:470|pread:checksum verification failed after retries {offset: 0, length: 1106}
2023/09/15 08:42:28|ERROR|th=00007F00758B2B80|/src/src/overlaybd/lsmt/file.cpp:571|operator():failed to read from 0-th file ( 0000561F2F662A70 pread return: -1 < size: 4096) errno=117(Structure needs cleaning)
2023/09/15 08:42:28|ERROR|th=00007F00758B2B80|/src/src/main.cpp:112|sure:io request failed, offset: 59528912896, ret: -1, retry times: 0, errno:117
2023/09/15 08:42:28|ERROR|th=00007F00758B2B80|/src/src/overlaybd/zfile/zfile.cpp:460|pread:checksum failed {offset: 0, length: 1106} (expected 0 but got 2168007317), reload result: 0
2023/09/15 08:42:28|ERROR|th=00007F00758B2B80|/src/src/overlaybd/zfile/zfile.cpp:460|pread:checksum failed {offset: 0, length: 1106} (expected 0 but got 2168007317), reload result: 0
2023/09/15 08:42:28|ERROR|th=00007F00758B2B80|/src/src/overlaybd/zfile/zfile.cpp:470|pread:checksum verification failed after retries {offset: 0, length: 1106}

What did you expect to happen?

No response

How can we reproduce it?

This is not easily reproducible

What is the version of your Overlaybd?

0.6.14

What is your OS environment?

Ubuntu

Are you willing to submit PRs to fix it?

  • Yes, I am willing to fix it.

It‘s a bug in zfile evict which is fixed in #259
Please update to >0.6.15.