RGB-WG / rgb-node

RGB node - the official server-side implementation

Home Page:https://rgb.tech

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

merge of merkle blocks anchored to the same TXID can fail

zoedberg opened this issue · comments

Running a test on rgb-lib many times I've discovered a rare error, that seems related to the merge of two merkle blocks (sharing the same TXID).
More specifically, the error occurs when rgb-node calls self.store.store_merge(db::ANCHORS, anchor.txid, anchor)?; (https://github.com/RGB-WG/rgb-node/blob/master/src/bucketd/processor.rs#L221), which leads to
commit_verify calling unreachable!("broken merkle block merge-reveal algorithm") (https://github.com/LNP-BP/client_side_validation/blob/4de299e0517e8734d8629169a86409a01de2a409/commit_verify/src/lnpbp4.rs#L826).

This can happen (around once every 10 test runs) when calling the consume_transfer API.

Here some logs from process_consignment() (in src/bucketd/processor.rs):

Stored anchor: Anchor { txid: b6515f51113235c41c815a82085afd27f7912f5a4cd6bffaef4dcd3ef10c9a12, lnpbp4_proof: MerkleBlock { depth: 3, cross_section: [ConcealedNode { depth: 3, hash: 03e43c730e76e654a40fdc0b62940bb7382ed95d4e8124ba687b4ec470cd1f01 }, CommitmentLeaf { protocol_id: Slice32(391cfae9f7b23562826b3260831e92698c7ec43c49e7afeed8e83a1bd75bbce9), message: 72c7278c8337a0480aa343dae2e6e6e1aee6c7b3df7d88f150a21c82f2b373ac }, ConcealedNode { depth: 2, hash: d42b5b6f1d6cc564fea2258e5147f4dd07735fac5aafa4a8394feb75ed8e366d }, ConcealedNode { depth: 1, hash: 5009030a186d268e698e184cf9e32607951ab81c6e3b42ecaf6ccf73a5ca0f2e }], entropy: None }, dbc_proof: OpretFirst }
Anchor: Anchor { txid: b6515f51113235c41c815a82085afd27f7912f5a4cd6bffaef4dcd3ef10c9a12, lnpbp4_proof: MerkleProof { pos: 0, path: [5009030a186d268e698e184cf9e32607951ab81c6e3b42ecaf6ccf73a5ca0f2e, d42b5b6f1d6cc564fea2258e5147f4dd07735fac5aafa4a8394feb75ed8e366d, 8fff224a68c261d62ab33d802182ff09d6332e9079fce71936ea414ed45ee782] }, dbc_proof: OpretFirst }
Restored anchor: Anchor { txid: b6515f51113235c41c815a82085afd27f7912f5a4cd6bffaef4dcd3ef10c9a12, lnpbp4_proof: MerkleBlock { depth: 3, cross_section: [CommitmentLeaf { protocol_id: Slice32(f0f2fc11fa38f3fd6132f46d8044612fc73e26b769025edabbe1290af9851897), message: c0abbb938d4da7ce3a25e704b5b41dbacc762afe45a536e7d0a962fb1b34413e }, ConcealedNode { depth: 3, hash: 8fff224a68c261d62ab33d802182ff09d6332e9079fce71936ea414ed45ee782 }, ConcealedNode { depth: 2, hash: d42b5b6f1d6cc564fea2258e5147f4dd07735fac5aafa4a8394feb75ed8e366d }, ConcealedNode { depth: 1, hash: 5009030a186d268e698e184cf9e32607951ab81c6e3b42ecaf6ccf73a5ca0f2e }], entropy: None }, dbc_proof: OpretFirst }
thread 'bucketd' panicked at 'internal error: entered unreachable code: broken merkle block merge-reveal algorithm', /home/zoe/.cargo/registry/src/github.com-1ecc6299db9ec823/commit_verify-0.8.0/src/lnpbp4.rs:826:17

Hoping it can help debugging, I've added a log of the Stored anchor: (the one that gets merged). While Anchor: is the Anchor<MerkleProof> that gets converted to Restored anchor:, which is the Anchor<MerkleBlock> that fails merging with the stored anchor.

Here the 2 merkle blocks that fail merging pretty-printed (for the lazy ones 😉):

MerkleBlock{
  depth: 3,
  cross_section: [
    ConcealedNode{
      depth: 3,
      hash: 03e43c730e76e654a40fdc0b62940bb7382ed95d4e8124ba687b4ec470cd1f01
    },
    CommitmentLeaf{
      protocol_id: Slice32(391cfae9f7b23562826b3260831e92698c7ec43c49e7afeed8e83a1bd75bbce9),
      message: 72c7278c8337a0480aa343dae2e6e6e1aee6c7b3df7d88f150a21c82f2b373ac
    },
    ConcealedNode{
      depth: 2,
      hash: d42b5b6f1d6cc564fea2258e5147f4dd07735fac5aafa4a8394feb75ed8e366d
    },
    ConcealedNode{
      depth: 1,
      hash: 5009030a186d268e698e184cf9e32607951ab81c6e3b42ecaf6ccf73a5ca0f2e
    }
  ],
  entropy: None
}
MerkleBlock{
  depth: 3,
  cross_section: [
    CommitmentLeaf{
      protocol_id: Slice32(f0f2fc11fa38f3fd6132f46d8044612fc73e26b769025edabbe1290af9851897),
      message: c0abbb938d4da7ce3a25e704b5b41dbacc762afe45a536e7d0a962fb1b34413e
    },
    ConcealedNode{
      depth: 3,
      hash: 8fff224a68c261d62ab33d802182ff09d6332e9079fce71936ea414ed45ee782
    },
    ConcealedNode{
      depth: 2,
      hash: d42b5b6f1d6cc564fea2258e5147f4dd07735fac5aafa4a8394feb75ed8e366d
    },
    ConcealedNode{
      depth: 1,
      hash: 5009030a186d268e698e184cf9e32607951ab81c6e3b42ecaf6ccf73a5ca0f2e
    }
  ],
  entropy: None
}

Here's an example of a working merge, in the hope it might help analyze the issue.

These are the two MerkleBlocks to be merged:

MerkleBlock{
  depth: 3,
  cross_section: [
    ConcealedNode{
      depth: 1,
      hash: 619d46fbc17c7fadecddc835f41084b9fff4a14f4bc8be1511a63c52a3bcf8eb
    },
    ConcealedNode{
      depth: 2,
      hash: 31b262976c6ae225a24a1f47f27149a07279623d12e7489cf0a90deb31731901
    },
    ConcealedNode{
      depth: 3,
      hash: 5865d6040c7586fba2f8947bf88749d78203932a1f2e84ee5ef050b547e5cd3c
    },
    CommitmentLeaf{
      protocol_id: Slice32(57e479feaeaa0be78047f8080bd3ae355ab6eabf4469413c81e17763d341dc87),
      message: b35877def35dda879d29ffbba558243cd8362983530405d2a512947f42df2d39
    }
  ],
  entropy: None
},
MerkleBlock{
  depth: 3,
  cross_section: [
    ConcealedNode{
      depth: 1,
      hash: 619d46fbc17c7fadecddc835f41084b9fff4a14f4bc8be1511a63c52a3bcf8eb
    },
    ConcealedNode{
      depth: 3,
      hash: bcdf20c3adc9a959005a1f5fde925144fbbf7adc0937f0a81ffdb207b9f83efd
    },
    CommitmentLeaf{
      protocol_id: Slice32(ddc44939452a967977d6921c24c27ffee411f8ef37894df77327bb3f9764e998),
      message: 660b186f0e7f3bfe5216d9273fa4527a52f91f17ea040dcce248ee31e5936a41
    },
    ConcealedNode{
      depth: 2,
      hash: 20a05aba6a8a62f0a01bcaa12c9cec27dc777055716ef2976223dd63a9d4ea51
    }
  ],
  entropy: None
},

and here's the result:

MerkleBlock{
  depth: 3,
  cross_section: [
    ConcealedNode{
      depth: 1,
      hash: 619d46fbc17c7fadecddc835f41084b9fff4a14f4bc8be1511a63c52a3bcf8eb
    },
    ConcealedNode{
      depth: 3,
      hash: bcdf20c3adc9a959005a1f5fde925144fbbf7adc0937f0a81ffdb207b9f83efd
    },
    CommitmentLeaf{
      protocol_id: Slice32(ddc44939452a967977d6921c24c27ffee411f8ef37894df77327bb3f9764e998),
      message: 660b186f0e7f3bfe5216d9273fa4527a52f91f17ea040dcce248ee31e5936a41
    },
    ConcealedNode{
      depth: 3,
      hash: 5865d6040c7586fba2f8947bf88749d78203932a1f2e84ee5ef050b547e5cd3c
    },
    CommitmentLeaf{
      protocol_id: Slice32(57e479feaeaa0be78047f8080bd3ae355ab6eabf4469413c81e17763d341dc87),
      message: b35877def35dda879d29ffbba558243cd8362983530405d2a512947f42df2d39
    }
  ],
  entropy: None
},

Thank you very much for the discovery and debug work. Will work on this as one of the priorities (given I can devote very small amount of time to the RGB these days, it may take some time though...)

@dr-orlovsky this issue is quite important, could you please put the task in the 0.8 milestone and give it some priority?

I am sturgling to understand some parts of this report.

I think we really need to start using the same terms, since "merge of merkle blocks anchored to the same TXID" doesn't make sense to me and I just fail to compile the meaning :(.

I can assume the real meaning was "Merging the multi-protocol commitments (or anchors?) fail in some cases" - but I am not sure how TXID appears here. There is no such thing as "anchoring to the TXID", since anchoring works the other way around: it is witness transaction (not txid) which commits ("anchors") to the multi-protocol commitment (which is in turn a commitment to a Merkle tree of commitments under different & distinct protocols).

Meanwhile I will just debug the merge-reveal algorithm in MerkleBlock since I can guess the bug is somehow related to it.

@zoedberg I finally understood your description, sorry for me whining about terminology; your description is really detailed and you provide a lot of debug information for the convenience. Starting investigation. In fact this issue shoud go to client_side_validation repo, so I will do another one there referencing this issue

Hi @dr-orlovsky,
This bug seems to become very frequent under certain circumstances and it's becoming a real show-stopper for us.
I was thinking, could I open a PR that changes that unreachable!() statement to a normal error? By doing so we could handle the error gracefully, without the need to kill the user app. It would be awesome for us if this temporary change could be merged and published in a new release of the 0.8.X series so let me know if this would be feasible without taking away too much of your time.

Lets try this way of fixing!

This should be fixed with LNP-BP/client_side_validation#115 -- once we would merge the fix I will release new v0.8.4 version of RGB Node using fixed library.

@zoedberg new versions of all crates with updated dependency with the fix from LNP-BP/client_side_validation#115 are released. For RGB Node this is v0.8.4. Waiting for your confirmation that this issue is now solved.

@dr-orlovsky I've missed this message, sorry. This issue can be closed as solved