V1C70RYG0D/bakwas

Title: Reverse Engineering Smart Contracts - Part 1

Introduction

This article serves as an introduction to reverse engineering smart contracts on the Ethereum Virtual Machine (EVM). To get started, we assume that the reader has a basic understanding of the EVM and is familiar with EVM assembly language. While it's true that smart contract code can often be accessed on Etherscan, there are cases where this is not possible, such as the contract at this address -> https://etherscan.io/address/0x2510c039cc3b061d79e564b38836da87e31b342f#code To understand the inner workings of such contracts, we need to dive deeper into the low-level machine instructions, known as opcodes, that the EVM executes.

EVM Opcodes

Smart contract languages like Solidity are compiled into low-level opcodes, which are the building blocks of EVM execution. At the time of writing, there are 141 unique opcodes that enable the EVM to be Turing-complete, meaning it can compute virtually anything given enough resources. Since opcodes are one byte in size, there can be a maximum of 256 (16²) opcodes in total. For simplicity, we can categorize opcodes into the following groups:

When handling arguments that contain more than 32 bytes (256 bits) of data, such as arrays or strings, the EVM splits the argument into multiple words medium.com. These words are added to the input data after all other arguments have been included. Additionally, the total size of all the words is included as another word before the array words. Instead of including the argument directly, the EVM adds the start position of the array words (including the size word) at the location where the argument would have been

To illustrate the decompilation process, we will first write a simple program in Solidity, compile it, and then analyze the generated bytecode. Here is the initial contract:

Upon compiling the contract, we obtain the bytecode, which we will now examine further. The bytecode starts by pushing the string's size and then the actual string. However, it is not immediately apparent where the string's hexadecimal representation is located in the bytecode. This is because the bytecode includes a Swarm hash at the end, which refers to a metadata file generated by Solidity. The metadata file contains information about the contract, such as the compiler version and the contract's functions.

Swarm is a distributed storage platform and content distribution service, or, more simply stated: a decentralized file storage. Although the Swarm hash will also be included in the runtime bytecode, it will never be interpreted as opcodes by the EVM, because its location can never be reached. Currently, Solidity utilizes the following format:

0xa1 0x65 'b' 'z' 'z' 'r' '0' 0x58 0x20 [32 bytes swarm hash] 0x00 0x29

Therefore, in this case, we can extract the following Swarm hash:

8caa84183abbba17d153df6480248b14519a24973f4d208a35972452ec354da0 Adding a Function to the Contract

Let's now add a function to our contract:

we can observe one more interesting thing here in the stack

and surprisingly when we look into the json file we find Upon examining the stack after adding this function, we can observe a 4-byte value being pushed onto it. This value corresponds to the function signature, which is derived by hashing the function's name and its inputs using keccak256 and then truncating the result to the first 4 bytes. To further investigate the function signature, we can search for it in the 4byte.directory, which is a database of popular function and event names.

lets search our signature in the 4byte directory - 0x258e60b6

and bingoo

it shows the correct function name along with param, isnt that cool?

V1C70RYG0D / bakwas

About

Languages