0chain / blobber

A storage provider (blobber) interface to the blockchain and consumers of storage.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

flaky behaviour: allocation_root_mismatch error when trying to create a directory

boddumanohar opened this issue · comments

On windows when trying to create a new directory second time, we are seeing allocation root mismatch error. Creating directory for the first time works. But from second time onwards we are seeing this error.

image

also in system tests,

https://0chain.slack.com/archives/C02AV6MKT36/p1669058193454869

we are having flaky behaviour where the same set of tests fail due to allocation root miss match error. But it gets resolved on retry.

Step to reproduce:

At the moment, we don't have a consistent way to reproduce this issue.

also fails on rename operation: https://github.com/0chain/zboxcli/actions/runs/3533560710/jobs/5930800557

    utils.go:66: Command failed on attempt [2/3] due to error [exit status 1]. Output: [consensus_not_met: Required consensus 3 got 2. Error: commit_error: {"error":"write_marker_verification_failed: Verification of write marker failed: invalid_write_marker: Invalid write marker. Prev Allocation root does not match the allocation root on record"}]
1745
    utils.go:66: Command failed on attempt [2/3] due to error [exit status 1]. Output: [consensus_not_met: Required consensus 3 got 2. Error: commit_error: {"error":"allocation_root_mismatch: Allocation root in the write marker does not match the calculated allocation root. Expected hash: 3ed23206277bd827822f9536671729dbcfabdedb3be3a68852347c0ff0d34d2e"}]
1746
    utils.go:66: Command failed on attempt [2/3] due to error [exit status 1]. Output: [Delete failed. consensus_not_met: Consensus on commit not met. Required 3, got 1]
1747
    utils.go:69: Command failed on final attempt [3/3] due to error [exit status 1]. Command String: [./zbox rename --allocation a51da28dd872c9e18c64f875739a7a675be6f52c6b5b1268c16f214daebfb685 --remotepath /A7ZQUQnrwi_test.txt --destname xwIq7e8GyB_test.txt --silent --wallet TestFileRename-Rename_and_delete_file_concurrently,_should_work_wallet.json --configDir ./config --config ./zbox_config.yaml] Output: [consensus_not_met: Rename failed. Required consensus 3 got 2]

@boddumanohar can you share steps/some insights to reproduce this? And does this happen only on windows or we can reproduce it in Linux env too?
The slack link isn't working and the test run link is expired, so I am unable to get more context on this :(

On linux, we used to see this on system_tests repo. But we haven't been seeing it lately. If it occurs again, @Kishan-Dhakan please comment the build here.
Thanks.

Had a chat with @boddumanohar and concluded that we used to see these issues a lot 3 months ago, but are rare these days. As suggested by him, I am deferring working on this one until we have another instance of this (showing up through one of the system tests failing)
cc: @dabasov @Kishan-Dhakan

commented

@aniketsingh03 please write a test to prove this

Was having a look at this today. This might be happening because of a mismatch in rootRef hash that is used to generate allocationRoot that is written into write_marker form var and sent along with commit request to a blobber , AND the rootRef hash that blobber extracts before finalizing a commit.

func (req *CommitRequest) commitBlobber(rootRef *fileref.Ref, latestWM *marker.WriteMarker, size int64) error {
	wm := &marker.WriteMarker{}
	timestamp := int64(common.Now())
	wm.AllocationRoot = encryption.Hash(rootRef.Hash + ":" + strconv.FormatInt(timestamp, 10))
	if latestWM != nil {

AND

allocationRoot := encryption.Hash(rootRef.Hash + ":" + strconv.FormatInt(int64(writeMarker.Timestamp), 10))
if allocationRoot != writeMarker.AllocationRoot {
result.AllocationRoot = allocationObj.AllocationRoot
if latestWriteMarkerEntity != nil {
result.WriteMarker = &latestWriteMarkerEntity.WM
}
result.Success = false
result.ErrorMessage = "Allocation root in the write marker does not match the calculated allocation root. Expected hash: " + allocationRoot
return &result, common.NewError("allocation_root_mismatch", result.ErrorMessage)

And we already have 2 system tests which test creation of 2 directories with same and different names (as mentioned in the issue description):

	t.Run("create dir with existing dirname should work", func(t *test.SystemTest) {
		allocID := setupAllocation(t, configPath)

		dirname := "/existingdir"
		output, err := createDir(t, configPath, allocID, dirname, true)

and

	t.Run("create with existing dir but different case", func(t *test.SystemTest) {
		allocID := setupAllocation(t, configPath)


		dirname := "/existingdir"
		output, err := createDir(t, configPath, allocID, dirname, true)
		require.Nil(t, err, "Unexpected create dir failure %s", strings.Join(output, "\n"))
		require.Len(t, output, 1)
		require.Equal(t, dirname+" directory created", output[0])

		dirname = "/existingDir"
		output, err = createDir(t, configPath, allocID, dirname, true)
		require.Nil(t, err, "Unexpected create dir failure %s", strings.Join(output, "\n"))
		require.Len(t, output, 1)
		require.Equal(t, dirname+" directory created", output[0])

So, I am not sure what type of a system test would help in catching this case (maybe creating both the directories very fast so that the creation of 2nd dir happens BEFORE the commit for 1st one is finalized?)
cc: @dabasov @boddumanohar

Maybe add a test to create 100 directories concurrently and raise error if there is allocation mismatch error in any goroutine and consider it ok if context deadline or timeout error occurs in some goroutine.

commented

closed it, as could not reproduce, @Kishan-Dhakan please create an issue for a system_test repo