ethereumjs / merkle-patricia-tree

Project is in active development and has been moved to the EthereumJS VM monorepo.

Home Page:https://github.com/ethereumjs/ethereumjs-monorepo

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Test library resistance against Preimage Attack

holgerd77 opened this issue · comments

There is a new (?) attack vector against merkle tree implementations described in this post:
https://flawed.net.nz/2018/02/21/attacking-merkle-trees-with-a-second-preimage-attack/

Currently it is unclear if this affects this library. This should be tested.

Just my thoughts here.

The attack on the Merkle tree is possible because preimages of intermediate nodes can be used as leaf nodes in a new tree, resulting in different inputs and different trees having the same root. This can be easily fixed by running a different mapping function for leaf and intermediate nodes before hashing them (this way using a preimage of an intermediate node as a leaf will result in a different hash). For example, prepend a different byte to leaf and intermediate nodes before hashing them (this solution is also given in the referenced article).

The Modified Merkle Patricia tries used in Ethereum naturally have this fix already:

  • Any leaf or extension node is a 2-item node, so they are differentiated from branch nodes, which are 17-item nodes (any RLP-encoded branch node can never be substituted with any RLP-encoded leaf or extension node).
  • Further, leaf nodes are prepended with 0010 or 0011 nibble so they can be differentiated from extension nodes, that are prepended with 0000 or 0001.

As the result, there is no way to use preimages of branch and extension nodes as leaf nodes (preimage in this context is the node itself, e.g. a 17-item RLP-encoded array), which makes the attack impossible.

An important assumption that allows applying the same analysis for Merkle Trees and Patricia Merkle Tries is that the input for the tree can only end up in leaf nodes. Since in Ethereum all keys are keccak hashes and are 32 bytes long, we can make this assumption. For arbitrary key lengths, the analysis would be more complex.

Thanks for the explanation. I will leave this open for some time for reference and other to recap.