zkSync

Question

zkSync

memosr opened this issue 3 months ago · comments

The zkSync Era is upgrading and has exciting news for builders and users.

After our last upgrade (Protocol Version 22, VM version 1.4.2), bringing support to EIP-4844 and drastically reducing our fees by a 10x factor, it is time for some new features and improvements on Protocol v24 (We faced challenges with v23 in our internal environments and decided to standardize the upgrades.) - VM version 1.5.0.

TL;DR;

New precompile: P256Verify (read this as a "new feature") that enables support to modern devices such as Apple's Secure Enclave, Webauthn, and Android Keychain to sign transactions extremely securely and user-friendly. This unlocks wallets that rely on hardware security instead of making you memorize 24 words.
The zkSync Bridgehub is ready. This component (alongside some additional ones, but let's be simple for now) lays the foundation for interoperability between ZK Stack chains, enabling them to share users and liquidity. As a standalone component, it unlocks trustless bridging between ZK Stack chains through L1.
Preparatory work to enable Validium mode, custom Data Availability layer, and custom base token on ZK Stack chains has been finalized. These features will be available soon (no need for a new upgrade).
Addition of EVM curve operations precompiles. This unlocks applications based on ZK to be deployed on zkSync Era and verify the proofs they generate.
There have been new pricing changes for some operations. When smart contracts for products/protocols are well written by the teams deploying on zkSync Era, this might result in cheaper transactions for you.
Partial support for .transfer/.send function calls without gas provided. This improves compatibility for Ethereum developers, although such calls are highly discouraged also on L1.
We can theoretically support up to 16 blobs per L1 batch. Ethereum currently only supports 6. We want to gradually test how the system will behave when we increase batch size (we currently use 2 blobs and will evolve the system to use more). The more blobs we can use within a single batch, the cheaper transactions will get for our community.
The team is doing initial experiments for potential future support of executing contracts with EVM bytecode (instead of our EraVM bytecode).

This upgrade is expected to start on May 13th, around 8 a.m. UTC and to be completed 24h later when the new batches are finalized on L1. Want to learn more about the technical aspects? Read on.

⚠️ IMPORTANT: During the upgrade window, withdrawals will be disabled. This means users won't be able to start their withdrawals (but withdrawals started before can be completed normally). Once upgrade is finished we will re-enable withdrawals.

ZK Stack

zkSync Bridgehub

Adherence to specific ZK Stack chain standards is crucial for establishing trust and interoperability. This necessitates a unified set of L1 smart contracts to oversee proof verification across all chains. Any upgrades to the proof system must be implemented uniformly across all chains.

Shared bridges play a pivotal role in fostering unified liquidity within the ecosystem. Each L1 asset (ETH and ERC20s) will be associated with a single bridge contract on L1. These bridges facilitate user actions such as deposit, withdrawal, and transfer across all ZK Stack chains. They are also responsible for deploying and maintaining their counterparts on the ZK Stack chain, which are asset contracts extended with bridging functionality.

We've deployed the zkSync Bridgehub contract on L1, which connects asset bridges to all the ZK Stack chains.
We added special contracts that enable these features on the ZK Stack side (i.e., ZK Stack chains).

We ensured the framework to be as modular as possible, allowing developers to modify the architecture of their chains as needed: chain base/fee token, consensus mechanism, staking, and DA requirements.

Architecture

The following image shows an example of the zkSync Bridgehub architecture, its components, and how ZK Stack chains are connected. For more detailed information, please refer to our docs here.

zkSync Bridgehub

The zkSync Bridgehub is the main component that acts as a hub for bridges so that they have a single point of communication with all ZK Stack chains contracts; the Bridgehub allows L1 assets to be locked in the same contract for all chains, including L3s and validiums. Moreover, it's the connection point where chains register into the ecosystem, interact with other chains' mailboxes, and request L1>L2 transactions for any chain.

State Transition Manager (STM)

An STM contract is responsible for proof verification for one or more chains. This component ensures common standards and enables trust zones between ZK Stack chains. It is also responsible for deploying a DiamondProxy for each chain (which is the ultimate representation and main component of a chain on L1).

At this point, only a single STM has been deployed, and any chain with the same EVM implementation used by zkSync Era (EraVM) can use it.

L1SharedBridge

To ensure generally accepted token standards (ERC20 tokens), as well as some special tokens (ETH, WETH, etc.), are well supported, and to ensure a single version of them exists on the ZK Stack chain, we've deployed the L1SharedBridge. These canonical asset contracts are deployed to L2 from L1 by this bridge shared by all chains. This is where assets are locked on L1. These bridges use the Bridgehub to communicate with all ZK Stack chains.

Previously, we held ETH on Era DiamondProxy and the deprecated L1ERC20Bridge for ERC20s. L1ERC20Bridge will still work, though, but it will proxy its requests to this bridge.

Note: When a chain has a custom base token, it is also connected to this bridge so that bridging to and from the L2 and other ZK Stack chains can happen seamlessly.

ℹ️ The L1SharedBridge is a new L1 contract, so we need to transfer the assets currently on the zkSync Era bridge to it. This will be performed with a governance transaction, just like the upgrade. There is no risk associated with it, and all funds are safe.

Chain specific contracts

Chains might need their own specific contracts, which is also supported. As examples:

A chain might implement its specific consensus mechanism. This requires its own contract.
At the moment, the ValidatorTimelock is also activated for every ZK Stack chain (it delays batch execution to enhance the ecosystem's security).

Validium and transaction filtering support

The Bridgehub + STM structure already supports ZK Stack Validium chains, including ones with a custom base token (other than ETH). Also, an option for transaction filtering was added (for permissioned chains for example), where L1>L2 transactions are first forwarded to a given address that can perform custom logic before sending them to L2.

Common Standards and Upgrades

In this initial phase, chains must follow common standards to trust each other. This means all chains start with the same empty state, have the same VM implementation and proof system. Asset contracts can trust each other on different chains, and the chains are upgraded together.

Upgrade mechanism

To ensure common standards and trust between chains, there must be a mechanism to keep up to date with the latest developments and enhancements of the ecosystem (that is, for them to be upgraded). Such upgrades must happen on multiple ZK Stack chains within a given period so they remain connected and trusted.

Initially, upgrades will have to be performed in a lockstep by all ZK Stack chains. Matter Labs will get in touch with chain operators about each upgrade including the following:

A link to the tagged commit in zksync-era;
A prepared executable for publishing L2 System Contracts’ bytecodes;
A prepared calldata for “finalizing” upgrade on L1;
Additional upgrade-specific instructions, such as configuration changes if necessary.

From a chain operator’s point of view, the upgrade process will involve two L1 transactions - the upgrade is “published” by Matter Labs (single transactions for the whole ZK Stack ecosystem). Then each chain “executes” the upgrade asynchronously.

Modularity preparation

On top of the changes for the zkSync Bridgehub, we have already modified L1 contracts to be prepared for a range of ZK Stack chains customization. Shared bridges and the STM are already prepared for future chains running on Validium mode, having an external Data Availability (DA) layer, and running on a custom base token (that is, having a different ERC20 as a gas token instead of Ether). This preparatory work is essential for the upcoming release of those features on ZK Stack.

zkSync Era Protocol Upgrade

RIP7212 support - P256Verify precompile support

The RIP7212 has introduced the P256Verify precompile to verify the secp256r1 elliptic curve ECDSA signature. This version supports a fully RIP7212-compliant contract. Since zkSync Era has a different gas schedule, we do not comply with the expected 3450 gas cost; otherwise, the interface should be identical.

ecPairing support

Alongside ecAdd and ecMul, the ecPairing precompile is now available on zkSync Era. These three operations can be accessed similarly as on Ethereum (precompile codes 0x06, 0x07, and 0x08). Note: these operations still consume a relevant amount of gas in this initial implementation. In future upgrades, the current smart contract implementation will be replaced by circuits, drastically reducing the cost of using them.

Cold/Warm storage support

Before VM 1.5.0, the same constant price per storage access was charged, regardless of whether it is cold or hot. Now the opcode execution will:

precharge with the maximum (cold) cost - 5500 gas for cold write, and 2000 per cold read.
Refund in the end of execution for all warm accesses - 60 gas for write and 30 for read.

EVM simulator code hash

Preparing for possible future support for EVM bytecode, we've added a new version of the code hash that starts with 0x02. This will allow the virtual machine to know that a contract with this version represents EVM bytecode and execute its instructions through an EVM simulator written with EraVM instructions. Such a simulator is not yet available.

Built-in Create2Factory support

We have a Create2Factory built into the protocol. This contract is pre-deployed into the user-space addresses (the first available address in this space, which is 2^16). We decided to pre-deploy it so it can be available for any chain using ZK Stack. Check the mainnet contract in our explorer (after the upgrade is complete) - https://explorer.zksync.io/address/0x0000000000000000000000000000000000010000

Pubdata charging

Earlier, the pubdata usage was different in each step of transaction execution; in cases of storage writes, some gas was being burned (and this is variable given state diff compression).

Although such behavior is simple to understand, it has the following problems:

It is hard to maintain EVM-like behavior. If the L1 gas price is high, then storage writes can take a gigantic amount of ergs (our EraVM internal “gas”) inside the context. Note that while pubdata being the most expensive part is common for all rollups, rollups that use calldata for public inputs do not have this issue since the price for calldata is precharged at the start of the tx.
It is prone to unneeded overhead. For instance, in case of reentrancy locks, the user will still have to pay the initial price for marking the lock as used. The price will get refunded in the end, but it is still worsens the UX.
Our system currently has a gas limit for each transaction, imposed by the prover/circuits restriction. This approach uses the gas limit for both computation and pubdata publishing, which introduces an upper bound for gasPerPubdata. If gasPerPubdata is set too high, users may not be able to publish a significant amount of pubdata within a transaction.

While calldata-based rollups precharge for calldata, we can't since the exact state diffs are known only after the transaction is finished. For this reason, we decided to use the post-charging approach*.* We'll keep a counter that tracks how much data has been spent and charge the user for data at the end of the transaction.

A challenge with post-charging is that users may spend all their gas within the transaction, so we'll have no gas to charge for pubdata. However, if the transaction is reverted, all the state changes that were related to it will be reverted, too.

That's why whenever we need to charge the user for pubdata but they didn't provide enough gas, the transaction will be reverted. The user will pay for the computation, but his transaction will not produce state changes (and thus, no pubdata).

The approach with post-charging removes the unneeded overhead and decouples the gas used for execution from the gas used for data availability, eliminating any caps on gasPerPubdata. Therefore, there is still a limit for the computation side of the transaction, but the user can still provide as much gas as needed for pubdata.

Considerations for developers

The approach with post-charging introduces one distinctive feature: it is not trivial to know the final price for a transaction at the time of its execution. When a transaction performs .call{gas: some_gas}, the final impact on the price of it may be higher than some_gas since the pubdata counter will be incremented during the execution and charged only at the end of the transaction.

While this limitation is not relevant for the average user, some specific applications may face some challenges.

Example for queue of withdrawals

Imagine that there is the following contract:

struct Withdrawal {
   address token;
   address to;
   uint256 amount;
}

Withdrawals[] queue;
uint256 lastProcessed;

function processNWithdrawals(uint256 N) external nonReentrant {
  uint256 current = lastProcessed + 1;
 	uint256 lastToProcess = current + N - 1;
  
  while(current <= lastToProcess) {
    // If the user provided some bad token that takes more than MAX_WITHDRAWAL_GAS  
    // to transfer, it is the problem of the user and it will stall the queue, so
    // the `_success` value is ignored.
    (bool _success, ) = queue[current].token.call{gas: MAX_WITHDRAWAL_GAS}(abi.encodeWithSignature("transfer(to,amount)", to, amount));
    current += 1;
  }
  lastProcessed = lastToProcess;
}

The contract above supports a queue of withdrawals. This queue supports any type of token, including potentially malicious ones. However, the queue will never get stuck since the MAX_WITHDRAWAL_GAS ensures that even if the malicious token does a lot of computation, it will be bound by this number, and so the caller of the processNWithdrawals won't spend more than MAX_WITHDRAWAL_GAS per token.

The above assumptions work in the pre-charge (calldata-based rollups) or pay-as-you-go models (Protocol version 22 and below). However, in the post-charge model, the MAX_WITHDRAWAL_GAS limits the amount of computation that can be done within the transaction but does not limit the amount of pubdata that can be published. Thus, if such a function publishes a huge L1→L2 message, it might make the entire top transaction fail. This effectively means that such a queue would be stalled.

How to prevent this issue on the developer side

If a developer really needs to limit the amount of gas that the subcall takes, all the subcalls should be routed through a special contract that guarantees that the total cost of the subcall will not be larger than the gas provided (by reverting if needed).

Case of when a malicious contract consumes a large but processable amount of pubdata

In this case, the topmost transaction can sponsor such subcalls. When a transaction is processed, at most, 80M gas can be passed to the execution. The rest can only be spent on pubdata during the post-charging.
Case of when a malicious contract consumes an unprocessable amount of pubdata

In this critical scenario, the malicious callee published so much pubdata that a transaction can not be included in a batch. This effectively means that no matter how much money the topmost transaction is willing to pay, the queue is stalled, potentially disrupting the entire process.

The only way to combat it is by setting a minimum amount of ergs that still have to be consumed with each emission of pubdata (basically to ensure that it is not possible to publish large chunks of pubdata while using negligible computation). Unfortunately, setting this minimal amount to cover the worst possible case (i.e., 80M ergs spent with a maximum of 100k of pubdata available, leading to 800 L2 gas/pub data byte) would likely be too harsh and negatively impact the average UX.

Overall, this is the way to go. However, for now, the only guarantee will be that a subcall of 1M gas is always processable, which will mean that at least 80 gas will have to be spent for each published pubdata byte. Even if the cost is higher than real L1 gas costs, it is reasonable in the long run since all the things that are published as pubdata are state-related, and so they have to be well-priced for long-term storage.

In the future, we will guarantee the processability of subcalls of larger size by increasing the number of pubdata that can be published per batch.

How can developers limit the maximum gas consumed by a subcall

If a developer wants to securely limit the amount of gas (including pubdata) consumed by a subcall, they can use the new GasBoundCaller contract. This contract will forward the call with the limit provided, reverting if it consumes more than it should. This contract will be deployed with the Create2Factory right after the upgrade is complete, and we will update this thread with its address.

CodeOracle system contract

We've deployed a new system contract named CodeOracle, which accepts the versioned hash of a bytecode and returns the bytecode itself. This contract enables future support of the extcodecopy functionality from Ethereum:

It accepts a versioned hash and double-checks that it is marked as “known,” i.e., the operator must know the bytecode for such a hash.
After that, it uses the decommit opcode, which accepts the versioned hash and the number of ergs to be spent, which is proportional to the length of the bytecode. If the bytecode has been decommitted before in the same batch, the requested cost will be refunded to the user.
- Note that the decommitment process does not only happen using the decommit opcode but also during calls to contracts. Whenever a contract is called, its code is decommitted into a memory page dedicated to the contract code. We never decommit the same bytecode twice, regardless of whether it was decommitted via an explicit opcode or during a call to another contract. The previous unpacked bytecode memory page will be reused.
The decommit opcode returns to the slice of the decommitted bytecode. Note that the returned pointer always has a length of 2^20 bytes, regardless of the length of the actual bytecode. So, the CodeOracle system contract's job is to shrink the returned data's length.

Changes to the unidirectional pointer policy

On zkSync Era, an already existing pointer can be used as calldata or returndata. This allows cheap proxies, for example, which do not need to copy the entire returndata into memory, saving on costs for both copying itself and memory growth. To prevent certain attacks, the requirement is that it is only possible to return a pointer with a memory page ID that is greater or equal to the ID of the heap of the frame that executes the ret call.

However, this restriction has been lifted from the kernel space to allow the CodeOracle system contract to return slices of previously decommitted code pages. That is, the system contracts are trusted never to return a slice of memory they know is immutable.

MsgValueSimulator stipend

This release partially supports .send/.transfer with a non-zero value when 0 gas is provided, achieved by the following means:

When a callee tries to send any amount of value, additional MSG_VALUE_SIMULATOR_STIPEND_GAS (at the time of this writing, this value is equal to 75500) are charged from the caller’s frame and are given to the MsgValueSimulator frame.
These funds are used to perform the updates of the slots responsible for the ETH balances. However, the majority of these ergs are spent on the decommitment of the callee.

Decommitment costs and MSG_VALUE_SIMULATOR_STIPEND_GAS

Whenever an EraVM contract is called, the caller has to pay a fixed amount of ergs for decommitting the callee's code (unpacking the callee's bytecode to a dedicated code memory page). The MsgValueSimulator's frame will have to pay for the decommitment of the "real" callee, so the more significant the stipend we provide, the larger the maximal size of the callee that the MsgValueSimulator will be able to call.

However, this also means that all users who transfer value will always have to pay this additional constant cost. Even though any further payments will be refunded, it still generates a lousy user experience as the estimations will always contain such overhead. So, it was decided to keep the MSG_VALUE_SIMULATOR_STIPEND_GAS large enough to decommit any bytecodes of 120kb but nothing more significant.

Anything more significant could be decommitted by predecommitting the bytecode by some external means and then calling the contract. Note that this feature relies on the operator's goodwill and should still be avoided by developers.

Memory pricing changes

Before

Whenever a contract (regardless of whether it is a system one or not) is called, 2^10 bytes of memory are given out for free before charging users linearly to the length.

After

Whenever a user contract is called, 2^12 bytes of memory are given out for free before charging users linearly to the length.
Whenever a kernel space (i.e., a system) contract is called, 2^21 bytes of memory are given out for free before charging users linearly to the length.

Circuits changes

Addition of the following circuits
- TransientStorageChecker (id 14) (For supporting tload and tstore opcodes)
- Secp256r1Verify (id 15) (p256 precompile)
- EIP4844Repack (id 255 - replacing the previous circuit for 4844) - Increases blob capacity from 2 to 16.
A new aggregation layer called the Recursion tip.
- This new layer is responsible for the recursive verification of node circuits, removing the responsibility from the scheduler circuit, freeing up its capacity.
- This new layer was needed to prevent a circuit capacity reduction due to other changes.
- Although only one is used now, more recursion tip circuits can be added to increase overall circuit capacity.
We stopped producing proofs for circuits that are not present within a batch.

Originally posted by @githubdoramon in #519