InvalidNonce error when interacting with the chain via scripts

Question

InvalidNonce error when interacting with the chain via scripts

milancermak opened this issue 2 months ago · comments

Milan Cermak commented 2 months ago

Which component is your bug related to?

sncast

Foundry Version

0.21.0

What operating system are you using?

MacOS

What system architecture are you using?

arm

What happened

I'm using a script to deploy a protocol. It consists of ~20 contracts, each needing a declare + deploy, plus some additional transactions to set up roles and whatnot. In total the deploy script might do about 50+ operations.

I have to run the script multiple times because it often fails. It seems it's because of a wrong nonce value, because this is an error that I have in the state json file:

      "2c0f6b42cc23cb96cb07ac70465b71c4d16924e8c7e2e622e0b514aec57e3f25": {
        "name": "invoke",
        "output": {
          "type": "ErrorResponse",
          "message": "Contract failed the validation = perform_validations call failed; failure reason: TransactionFailureReason(code='native_blockifier.PyTransactionPreValidationError', error_message='InvalidNonce { address: ContractAddress(PatriciaKey(StarkFelt(\"0x017721cd89df40d33907b70b42be2a524abeea23a572cf41c79ffe2422e7814e\"))), account_nonce: Nonce(StarkFelt(\"0x0000000000000000000000000000000000000000000000000000000000000023\")), incoming_tx_nonce: Nonce(StarkFelt(\"0x0000000000000000000000000000000000000000000000000000000000000022\")) }')."
        },
        "status": "Error",
        "timestamp": 1713365947,
        "misc": null
      },

When the fail happens, it's always the same error - account nonce is 1 greater than incoming tx nonce.

Sometimes the script makes only 1 TX and fails immediately, sometimes multiple TXs go through (the best I got was 15, but usually it fails after 2 or 3).

This is an example output of the script:

Deploying core contracts
Deploying gates
Deploying oracles
Transaction hash = 0x74a0b99fc170f6106508ab4d7c411db034c3ab714594c9f006fe8959b58906
Waiting for transaction to be accepted (59 retries / 295s left until timeout)
Waiting for transaction to be accepted (58 retries / 290s left until timeout)
Setting up roles
Transaction hash = 0x6408e934bcb5750582b8f038317dbcfd1b6dfb19fe6410fb66a7fefd8bc2d2c
Waiting for transaction to be accepted (59 retries / 295s left until timeout)
Role granted: ABS -> PU
Transaction hash = 0x1a267e6bdad667a5b50c1e16ea2e11f79c80dcbdf8ff646532843affaac026f
Waiting for transaction to be accepted (59 retries / 295s left until timeout)
Waiting for transaction to be accepted (58 retries / 290s left until timeout)
Role granted: SE -> ABB
Transaction hash = 0x21cc70763c84302ac61d769f17c8360a4abdf5948eec243693bf166ef3506c1
Waiting for transaction to be accepted (59 retries / 295s left until timeout)
Waiting for transaction to be accepted (58 retries / 290s left until timeout)
Role granted: SE -> PU
Transaction hash = 0x5a0aba2f261c52ebffa4aab6d9c6c733d5646ca051aebabf7e2edecffa9b50
Waiting for transaction to be accepted (59 retries / 295s left until timeout)
Role granted: SE -> CA
Transaction hash = 0x2ea456155451aa593a0946e6d50dd2e772c5697c2098e4c5559bb52b6a1d013
Waiting for transaction to be accepted (59 retries / 295s left until timeout)
Waiting for transaction to be accepted (58 retries / 290s left until timeout)
command: script run
message:
    0x6772616e7420726f6c65206661696c6564 ('grant role failed')

status: script panicked

And on the subsequent run:

Deploying core contracts
Deploying gates
Deploying oracles
Setting up roles
Role granted: ABS -> PU
Role granted: SE -> ABB
Role granted: SE -> PU
Role granted: SE -> CA
Transaction hash = 0x25ed295d1ed164b346ba3d6660ccb99c95e0567e6e5d729a54c1372a678546e
Waiting for transaction to be accepted (59 retries / 295s left until timeout)
Role granted: SEER -> PU
command: script run
message:
    0x6772616e7420726f6c65206661696c6564 ('grant role failed')

status: script panicked

I guess the issue is somewhat similar to #1336 I reported some time ago, but that one was fixed.

Trace

No response

Is there an existing issue for this?

I have searched the existing issues and verified no issue exits for this problem.

Wojciech Szymczyk · Answer 1 · Fri Apr 19 2024 15:09:43 GMT+0800 (China Standard Time)

Hi @milancermak , thanks for opening an issue. Some rpc nodes are configured with higher poll intervals, which means they may return "older" nonces in pending blocks, or even not be able to obtain pending blocks at all. It seems yours may be this case. Have you tried to set the nonce manually for all the transactions you try to do?

You could try to get_nonce once at the beggining of your script, and then increment it and use for transactions, something like:

let nonce = get_nonce('pending');

let declare_result = declare("Contract", Option::Some(max_fee), Option::Some(nonce))
        .expect('declare failed');
let nonce = nonce + 1;

// next transaction using the nonce

Of course you might need to be careful with this as well, since it still may happen wrong nonce is used for some transactions (eg some transactions are skipped after get_nonce, and the nonce is incremented manually - next tx will then have a wrong nonce).

Please let me know if this mitigates your problem. As a side note, we plan to migrate script execution to multicalls to prevent these issues in the future #823

Milan Cermak · Answer 2 · Fri Apr 19 2024 17:45:40 GMT+0800 (China Standard Time)

Ok, I see what you're saying. I won't change it in my script now because 1) it's already deployed and 2) it would be require some refactoring, passing the nonce into subfunctions manually etc. I'll keep it in mind for the future though, will write my scripts with manual nonce handling.

Still, it would be great if you could find some way around the issue. From my perspective, I would rather have the script take longer, but be guaranteed to finish than to babysit it and rerun it again and again. Also, thank you for the idempotency feature ❤️

Re: multicalls - I'm not sure that would help in my particular case. AFAIK, in a multicall context, we can't use the output of call 1 in call 2, which is crucial if the multicall would be a declare + deploy + invoke.

Wojciech Szymczyk · Answer 3 · Fri Apr 19 2024 17:51:31 GMT+0800 (China Standard Time)

Ok, I see what you're saying. I won't change it in my script now because 1) it's already deployed

If you are using a recent sncast version, you should be using state files, which means all the transactions that succeeded and are present in the state file won't be repeated 👍 But I see your point.

Still, it would be great if you could find some way around the issue. From my perspective, I would rather have the script take longer, but be guaranteed to finish than to babysit it and rerun it again and again. Also, thank you for the idempotency feature ❤️

Sure, thanks fot the suggestion! Actually, we plan to add new functions to script (to get tx receipt, its status and its hash) and the plan is they could be used to achieve exactly what you say! I will also try to find other, less-bruteforcy ways to achieve that to give users more options.

If there is something more I can help you with, please let me know. Otherwise I will proceed with closing this issue shortly. thanks!