bytes and string fields not encoded into 32-byte chunks

Question

bytes and string fields not encoded into 32-byte chunks

ylembachar opened this issue 2 years ago · comments

We are getting several un-decoded logs with the error: tuple index 7: not enough bytes remaining: 151 < 160 whereas these logs are decoded correctly by Etherscan and web3j.

As you mentioned in two of the previous issues we submitted (issue 1 and issue 2), the data length is not a multiple of 32 and should be padded.

We suspect the bytes and string fields are not padded correctly in some logs due to an old issue in Solidity where these were not encoded correctly and not padded resulting in several logs with a truncated data field. Here is the change in question: https://docs.soliditylang.org/en/latest/050-breaking-changes.html (See paragraph: "The ABI encoder now properly pads byte arrays and strings from calldata (msg.data and external function parameters) when used in external function calls and in abi.encode. For unpadded encoding, use abi.encodePacked.").

We believe other libraries are aware of this error and thus decode those logs and we were wondering if you would be considering adding this change to the library as well.

evan saulpaugh commented a year ago

#59

evan saulpaugh · Answer 1 · Thu Jan 26 2023 18:06:09 GMT+0800 (China Standard Time)

This is a difficult issue because headlong's decoding has always been designed to be strict and expect the proper padding byte-for-byte. So ignoring padding is probably not a good idea.

Adding the proper padding in every case would likely be complex and difficult, especially if the improperly padded array is not the last element. And in cases of multiple improperly padded arrays. Especially because ABIv2 allows arbitrary values for dynamic element offsets, not just multiples of 32.

Can you provide more information about the exact nature of the padding problem which was fixed and how libraries have handled support for it? headlong must work in 100% of cases for event encodings which conform to the specification, and ideally, would reject any malformed events.

ylembachar · Answer 2 · Thu Jan 26 2023 19:07:28 GMT+0800 (China Standard Time)

Not sure how other libraries are doing that, but what we noticed is that in the case of an ABI where we have one bytes or string field at the end of the ABI, if the data is truncated, then removing the check that it should go into a 32-byte chunk yields the same decoding result as in Etherscan for the examples mentioned in the issues above. We assumed the decoding is correct as it yields addresses that are indexed in Etherescan and strings that are also well formed sentences. In headlong, removing the check here allows to decode these specific cases, as we don't check that the bytes are padded but I agree that this goes against enforcing strict encoding and might perhaps yield some wrong data.

evan saulpaugh · Answer 3 · Thu Jan 26 2023 19:36:19 GMT+0800 (China Standard Time)

Event event = Event.fromJson("{\n" +
                "    \"anonymous\": false,\n" +
                "    \"inputs\": [\n" +
                "      {\n" +
                "        \"indexed\": true,\n" +
                "        \"name\": \"specId\",\n" +
                "        \"type\": \"bytes32\"\n" +
                "      },\n" +
                "      {\n" +
                "        \"indexed\": false,\n" +
                "        \"name\": \"requester\",\n" +
                "        \"type\": \"address\"\n" +
                "      },\n" +
                "      {\n" +
                "        \"indexed\": false,\n" +
                "        \"name\": \"requestId\",\n" +
                "        \"type\": \"bytes32\"\n" +
                "      },\n" +
                "      {\n" +
                "        \"indexed\": false,\n" +
                "        \"name\": \"payment\",\n" +
                "        \"type\": \"uint256\"\n" +
                "      },\n" +
                "      {\n" +
                "        \"indexed\": false,\n" +
                "        \"name\": \"callbackAddr\",\n" +
                "        \"type\": \"address\"\n" +
                "      },\n" +
                "      {\n" +
                "        \"indexed\": false,\n" +
                "        \"name\": \"callbackFunctionId\",\n" +
                "        \"type\": \"bytes4\"\n" +
                "      },\n" +
                "      {\n" +
                "        \"indexed\": false,\n" +
                "        \"name\": \"cancelExpiration\",\n" +
                "        \"type\": \"uint256\"\n" +
                "      },\n" +
                "      {\n" +
                "        \"indexed\": false,\n" +
                "        \"name\": \"dataVersion\",\n" +
                "        \"type\": \"uint256\"\n" +
                "      },\n" +
                "      {\n" +
                "        \"indexed\": false,\n" +
                "        \"name\": \"data\",\n" +
                "        \"type\": \"bytes\"\n" +
                "      }\n" +
                "    ],\n" +
                "    \"name\": \"OracleRequest\",\n" +
                "    \"type\": \"event\"\n" +
                "  }");
final byte[][] topics = new byte[][] {
        Strings.decode("d8d7ecc4800d25fa53ce0372f13a416d98907a7ef3d8d3bdd79cf4fe75529c65"),
        Strings.decode("3431373466353064613337333431306161363430326265313263626538636463"),
};
final byte[] data = Strings.decode("000000000000000000000000f9fc2b4a0e487297b05285e9b3327f26e70c4e9b6b7ff4ffa51b345e8459b03fdb280ac7f99caaf432142a1b800c581d8b4ed39f00000000000000000000000000000000000000000000000000ee59ddf89580d6000000000000000000000000f9fc2b4a0e487297b05285e9b3327f26e70c4e9b042f2b650000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000604e53520000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000976375726c787e68747470733a2f2f6d61726b65742e6c696e6b2f76312f6e6f6465732f66373662653531392d653431652d343861302d393433302d3437333139656461306634332f766572696669636174696f6e2f726573706f6e73653f746f6b656e3d663336613737333531626533346433323830333965623766643536626630303264706174686d726573706f6e7365546f6b656e");
Tuple args = event.decodeArgs(topics, data);
System.out.println(args);

With this code, if I remove the call to checkNoDecodePossible, I still get a

java.nio.BufferUnderflowException
	at java.base/java.nio.HeapByteBuffer.get(HeapByteBuffer.java:183)
	at java.base/java.nio.ByteBuffer.get(ByteBuffer.java:826)
	at com.esaulpaugh.headlong.abi.ArrayType.decodeBytes(ArrayType.java:428)

because decodeBytes is checking the padding anyway.

Max Schmidt · Answer 4 · Thu Jan 26 2023 19:51:49 GMT+0800 (China Standard Time)

Imho you're right when you go down the strict route. However other libs don't do those strict checks, like ether-js (what etherscan uses), web3js and the old ethereumJ. Following https://onlinelibrary.wiley.com/doi/epdf/10.1002/int.22633 it's pretty clear that solidity pre 0.5 (encoding dynamic arrays without any implicit padding) is still the way to go.

Not sure how to proceed here though - the longterm solution could be a lenient check mode ;-)

ylembachar · Answer 5 · Thu Jan 26 2023 20:17:55 GMT+0800 (China Standard Time)

I guess if we wanted to be able to decode un-padded bytes and strings while using strict encoding to allow to decode these events that were encoded with a version that did not pad these elements, although this does not follow the strict encoding specification, we would also need to remove the check that bytes and string are padded whenever this check is made in the library.

evan saulpaugh · Answer 6 · Thu Jan 26 2023 20:21:59 GMT+0800 (China Standard Time)

I don't know. Solidity hasn't reached version 1.0 yet and if Ethereum scales, 100,000 pre-0.5 contracts will be a very small fraction of all contracts.

On the other hand, if an ABIv3 largely replaces ABIv2, ABIv2 will be used mostly for legacy contracts anyway.

If you can get the Solidity team to update the abi-spec to specify lenient decoding, then it would make sense for headlong to implement it.

I do not understand how it would work. If a byte array (or string) is not the final dynamic element in a tuple, how is the rest of the tuple to be decoded? It would have to differentiate between missing padding (pre-0.5) and either malformed padding (padding is present and of the correct length, but contains non-zero bytes) or correct padding. Which requires knowing the offset of the next dynamic element, which may or may not be immediately after the current element (there could be a gap). Decoding then becomes very stateful.

And what if it is the last dynamic element but there is additional data in the buffer? Should the ByteBuffer's position be advanced to where the padding would end if it were present? i.e. are these bytes "consumed"? There would be no way to tell if the next bytes are padding or something else entirely. There is a fine line between lenient decoding and best-effort decoding. headlong can never be a best-effort decoder.

ylembachar · Answer 7 · Fri Jan 27 2023 17:42:23 GMT+0800 (China Standard Time)

Agreed. Thank you for your time and clarifications, there would indeed be no deterministic way to decode these bytes, especially in the absence of a clear specification.

evan saulpaugh · Answer 8 · Fri Jan 27 2023 17:51:29 GMT+0800 (China Standard Time)

Perhaps a special Event method like decodePre0_5Args could at some point be added if it is well-specified and widely needed.

evan saulpaugh · Answer 9 · Tue Aug 22 2023 08:50:34 GMT+0800 (China Standard Time)

I have a potential solution in the form of an end-user supplied flag on the level of the ABIType instance.

Here is the branch: https://github.com/esaulpaugh/headlong/tree/legacy-flag

Let me know if this is close to a complete working solution for you. I considered various other options and this seems like the cleanest and safest one.

ylembachar · Answer 10 · Thu Aug 24 2023 17:06:36 GMT+0800 (China Standard Time)

Hi Evan, thank you for providing this alternate solution. I've been testing it against the examples we have and it yields the same results as the manual padding we're currently performing - also checked against Etherscan, and the results on the few tests I made on unpadded bytes and strings are equivalent but there is however a slight difference in formatting when changing flags. When decoding strings, the decodeCall function returns the hexadecimal representation when using the new flag for strings, whereas it returns the string representation when using no flags. I added a snippet below. The input is padded correctly for the sake of comparison.

"test string" in {
      val abi = """{
                  |    "constant": false,
                  |    "inputs": [
                  |      {
                  |        "internalType": "string",
                  |        "name": "name",
                  |        "type": "string"
                  |      },
                  |      {
                  |        "internalType": "address",
                  |        "name": "owner",
                  |        "type": "address"
                  |      },
                  |      {
                  |        "internalType": "uint256",
                  |        "name": "duration",
                  |        "type": "uint256"
                  |      },
                  |      {
                  |        "internalType": "bytes32",
                  |        "name": "secret",
                  |        "type": "bytes32"
                  |      },
                  |      {
                  |        "internalType": "address",
                  |        "name": "resolver",
                  |        "type": "address"
                  |      },
                  |      {
                  |        "internalType": "address",
                  |        "name": "addr",
                  |        "type": "address"
                  |      }
                  |    ],
                  |    "name": "registerWithConfig",
                  |    "outputs": [],
                  |    "payable": true,
                  |    "stateMutability": "payable",
                  |    "type": "function"
                  |  }""".stripMargin
      val hex =
        "f7a1696300000000000000000000000000000000000000000000000000000000000000c00000000000000000000000006ad11f36d051ccf5bc06bc53356549bbfa61a1a40000000000000000000000000000000000000000000000000000000003c30ab0f20b3761f7047fde64cec9977a29afeb41e9f3d8138f81687af2e4b1dffa22c20000000000000000000000004976fb03c32e5b8cfe2b6ccb31c09ba78ebaba410000000000000000000000006ad11f36d051ccf5bc06bc53356549bbfa61a1a400000000000000000000000000000000000000000000000000000000000000086a61736f6e353636000000000000000000000000000000000000000000000000"
      val f = com.esaulpaugh.headlong.abi.Function.fromJson(
        ABIType.FLAG_LEGACY_DECODE,
        abi
      )
      val f2 = com.esaulpaugh.headlong.abi.Function.fromJson(
        ABIType.FLAGS_NONE,
        abi
      )

      val decoded  = f.decodeCall(ByteBuffer.wrap(Strings.decode(hex)))
      val decoded2 = f2.decodeCall(ByteBuffer.wrap(Strings.decode(hex)))

      println(decoded)
      println(decoded2)
    }

The output:

[[106, 97, 115, 111, 110, 53, 54, 54], 0x6aD11F36D051cCF5bc06bC53356549bbfa61a1A4, 63113904, [-14, 11, 55, 97, -9, 4, 127, -34, 100, -50, -55, -105, 122, 41, -81, -21, 65, -23, -13, -40, 19, -113, -127, 104, 122, -14, -28, -79, -33, -6, 34, -62], 0x4976fb03C32e5B8cfe2b6cCB31c09Ba78EBaBa41, 0x6aD11F36D051cCF5bc06bC53356549bbfa61a1A4]
[jason566, 0x6aD11F36D051cCF5bc06bC53356549bbfa61a1A4, 63113904, [-14, 11, 55, 97, -9, 4, 127, -34, 100, -50, -55, -105, 122, 41, -81, -21, 65, -23, -13, -40, 19, -113, -127, 104, 122, -14, -28, -79, -33, -6, 34, -62], 0x4976fb03C32e5B8cfe2b6cCB31c09Ba78EBaBa41, 0x6aD11F36D051cCF5bc06bC53356549bbfa61a1A4]

evan saulpaugh · Answer 11 · Thu Aug 24 2023 17:55:28 GMT+0800 (China Standard Time)

@ylembachar Thanks for testing! After looking at this, I believe I have a fix: 6c3999f

ylembachar · Answer 12 · Thu Aug 24 2023 20:52:51 GMT+0800 (China Standard Time)

Thanks for the quick fix! Fixes the issue on our end. Our tests with both versions (flagged and unflagged) are also all succeeding.

evan saulpaugh · Answer 13 · Fri Sep 08 2023 04:26:36 GMT+0800 (China Standard Time)

fixed https://github.com/esaulpaugh/headlong/releases/tag/v10.0.0