ebellocchia / bip_utils

Generation of mnemonics, seeds, private/public keys and addresses for different types of cryptocurrencies

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Errors when generating BIP86 taproot addresses

anon4250158 opened this issue · comments

I generated 50,000 random 12-word mnemonics and then for each generated the first two taproot addresses for path m/86'/0'/0'/0/x. I then generated the addresses for the same 12-word mnemonics using both my own code written in c and the tool found here. Comparing them, I found that bip_utils incorrectly generated 757 out of 100,000 addresses. It usually will generate one correct address and one incorrect. Here are a few mnemonics that produce incorrect addresses:

belt student into world volume maple proud follow orient conduct expose word
lend local cluster reform spend renew adult proud tongue media phrase tank
please glare property seven language mimic flag excuse reject earth deposit craft
elite pluck delay story blood bulk guess jump catch step arch sound
snack lesson glare update where inherit garden cat unfold burden eternal joy

Here is the code used to generate the addresses (I read in the mnemonics from a pre-generated file):

seed_bytes = Bip39SeedGenerator(mnemonic).Generate()
bip_mst_ctx = Bip86.FromSeed(seed_bytes, Bip86Coins.BITCOIN)
bip_acc_ctx = bip_mst_ctx.Purpose().Coin().Account(0)
bip_chg_ctx = bip_acc_ctx.Change(Bip44Changes.CHAIN_EXT)

bip_addr_ctx = bip_chg_ctx.AddressIndex(0)
addr1 = bip_addr_ctx.PublicKey().ToAddress()

bip_addr_ctx = bip_chg_ctx.AddressIndex(1)
addr2 = bip_addr_ctx.PublicKey().ToAddress()

Unfortunately, this error occurs when using coincurve or ecdsa.

I also modified both bip_utils and the seed tool linked above to allow 3-, 6- and 9-word mnemonics, and not only do some of them have the same problem as above, but sometimes the incorrect address is shorter than the correct 62 characters. This doesn't really matter if you continue to disallow short mnemonics, but maybe it will help find the issue? Two examples:

abandon abandon another
abandon abandon between

I'm using python 3.11.3 on Windows 10 if that matters.

Hi,
it's pretty strange because for those mnemonics the public key is correct (same of the tool) but only the address is different, so it's something related to the address encoding.
I should check with other wallets. Did you verify that the addresses generated by the tool are correct?

And please provide also some mnemonics (even if short) where the address is shorter than the correct length.

Regards,
Emanuele

Sorry I hit the "close issue" button.

I'll keep trying to narrow it down, but here is my finding so far:

The public key displayed by the seed tool is the pubkey BEFORE the tweak is applied, so yes, they are correct. But that's not the problem.

I'm going to use the mnemonic

belt student into world volume maple proud follow orient conduct expose word

as the example here. Going through all of the derivation steps to get to the pubkey for index 0, you do indeed get

03005e112170e7031712971273a8f9bc5c4a83db01eb1068264d5864d4ef685183

However, now we have to do the tweak step, all of which is found in your file P2TR_addr.py. I added a print() after the out_point is calculated in function TweakPublicKey() to see the final pubkey x-coordinate. I did the same in my code and in the seed tool. In bip_utils, the x-coord came back as

fa235e2843f2580c3f6a0280d3302c94e0f1b85e3baa0cdbc24fbf4dd1901be9

which does not match what I or the seed tool get, which is

dadc0df8ba50dec3260b5f2b67e89fed082b3439a7553784b6afd9ac375c6964

Your x-coord is bech32m-encoded as bc1plg34u2zr7fvqc0m2q2qdxvpvjns0rwz78w4qek7zf7l5m5vsr05s23adtg, whereas mine and the seed tool's is bc1pmtwqm7962r0vxfsttu4k06yla5yzkdpe5a2n0p9k4lv6cd6ud9jq57t6zu.

So the problem appears to be in the tweak step, not with bech32m encoding.

I've found the primary problem: leading zeros. This issue can then cause problems in two different ways to result in incorrect addresses.

  1. The taptweak hashing. The step here (I call tweak96 because it should be 96 bytes) is SHA256("TapTweak") || SHA256("TapTweak") || pubkey.X(). However, if the pubkey.X() has leading zeros, they are NOT included when appending. This results in an incorrect hash, which results in an incorrectly tweaked pubkey, which results in an incorrect address. Example (arrows indicate where things went wrong):
mnemonic: belt student into world volume maple proud follow orient conduct expose word [address index 0]

your pubkey.X(): 005e112170e7031712971273a8f9bc5c4a83db01eb1068264d5864d4ef685183
correct pubkey.X(): 005e112170e7031712971273a8f9bc5c4a83db01eb1068264d5864d4ef685183

--> your tweak96: e80fe1639c9ca050e3af1b39c143c63e429cbceb15d940fbb5c5a1f4af57c5e9e80fe1639c9ca050e3af1b39c143c63e429cbceb15d940fbb5c5a1f4af57c5e95e112170e7031712971273a8f9bc5c4a83db01eb1068264d5864d4ef685183
--> correct tweak96: e80fe1639c9ca050e3af1b39c143c63e429cbceb15d940fbb5c5a1f4af57c5e9e80fe1639c9ca050e3af1b39c143c63e429cbceb15d940fbb5c5a1f4af57c5e9**00**5e112170e7031712971273a8f9bc5c4a83db01eb1068264d5864d4ef685183

--> your hash of tweak96: ece478447944b1d658e54fd68a34940f864a11f2837fc0af4064f781d239d4d5
--> correct hash of tweak96: 0351c6ad560bb624935c6770d0da73d53804062224784a1cc84713a555cbc7ab

--> your tweaked pubkey.X(): fa235e2843f2580c3f6a0280d3302c94e0f1b85e3baa0cdbc24fbf4dd1901be9
--> correct tweaked pubkey.X(): dadc0df8ba50dec3260b5f2b67e89fed082b3439a7553784b6afd9ac375c6964

--> your address: bc1plg34u2zr7fvqc0m2q2qdxvpvjns0rwz78w4qek7zf7l5m5vsr05s23adtg
--> correct address: bc1pmtwqm7962r0vxfsttu4k06yla5yzkdpe5a2n0p9k4lv6cd6ud9jq57t6zu

===========================

  1. The bech32m encoding. If the tweaked pubkey.X() has leading zeros, they are NOT included when encoding. This results in incorrect AND shorter-than-62-character addresses (not just for 3-word mnemonics as it turns out). Example:
mnemonic: elite pluck delay story blood bulk guess jump catch step arch sound [address index 1]

your pubkey.X(): 673841c5954e9b99a5bc008865303381e2cf9bfc0600c59943b302f154eca12e
correct pubkey.X(): 673841c5954e9b99a5bc008865303381e2cf9bfc0600c59943b302f154eca12e

your tweak96: e80fe1639c9ca050e3af1b39c143c63e429cbceb15d940fbb5c5a1f4af57c5e9e80fe1639c9ca050e3af1b39c143c63e429cbceb15d940fbb5c5a1f4af57c5e9673841c5954e9b99a5bc008865303381e2cf9bfc0600c59943b302f154eca12e
correct tweak96: e80fe1639c9ca050e3af1b39c143c63e429cbceb15d940fbb5c5a1f4af57c5e9e80fe1639c9ca050e3af1b39c143c63e429cbceb15d940fbb5c5a1f4af57c5e9f0e1bee9f8442d7c0aebac70032edb7142182e853f68fa21a43c6161c3b8355e

your hash of tweak96: c1c0757c793f1ea9c00572271249fa57e29902c0dc1868b180ba865b7b4ba395
correct hash of tweak96: c1c0757c793f1ea9c00572271249fa57e29902c0dc1868b180ba865b7b4ba395

your tweaked pubkey.X(): 0072ea9fcd2691ac367a6792ddf3539f5fdff7752a5958425287c9394507629a
correct tweaked pubkey.X(): 0072ea9fcd2691ac367a6792ddf3539f5fdff7752a5958425287c9394507629a

--> your address: bc1pwt4flnfxjxkrv7n8jtwlx5ultl0lwaf2t9vyy558eyu52pmzngkzq44z
--> correct address: bc1pqpew487dy6g6cdn6v7fdmu6nna0alam49fv4ssjjslynj3g8v2dqj277s2

Note the leading zeros in the tweaked pubkey.X() that were ignored, resulting in a 60-character address.

===========================
This seems like it might be a python problem, but I'm a python noob. The only way to deal with this is to know the correct length of the data and make sure whatever uses the data always accounts for leading zeros. It's a good thing we know that the data here is always 32 bytes, but it's going to take some extra code to compensate for python's shortcoming.

Thank you very much for your analysis. I had the feeling it was something related to the leading zeros when you mentioned the shorter addresses.
Ok, I'll try and fix it.

Hi, I fixed the issue and I got the correct addresses for the 2 examples you provided.
Please try if it works correctly now.

Bip_utils, the seed tool, and my program all now produce identical files for the 50000 12-word test mnemonics, so I'm going to say it has been fixed and close this issue. Thank you for creating bip_utils AND maintaining it over the years!