🕵️ stelf

Hide binary data inside x86 ELF files by changing the instruction encoding

How it works?

TL;DR: Stelf works by putting binary data inside the instruction itself, but keeping the same instruction as before, without changing the execution flow or anything.

Longer explanation

Stelf works by (ab)using of the ModR/M byte and the 'direction-bit' d: in instructions that involve memory operands or registers, the 'ModR/M' byte follows the instruction's opcode.

This byte plays a crucial role in indicating the addressing mode of the instruction and specifying the appropriate source or destination registers to be used.

Take the ADD EAX, EBX (01 d8) instruction for example:

  opcode                       ModR/M byte
 000000001                  [11] [011] [000]
        ||                    |    |     |
        | > sign bit          |    |      > RM 
        |                     |     > REG
         > direction bit       > MOD
         
direction bit = 0
sign bit = 1
MOD =  11, REG = 011, RM  = 000

The s bit is used to indicate 32-bit operands, whereas the d bit determines the destination or source register within the ModR/M byte.

Regarding the ModR/M byte, the first two bits (bits 7 and 6) represent the MOD field, which indicates the addressing mode of the instruction. Furthermore, bits 5-3 correspond to the REG field, which identifies the destination or source register. Finally, bits 2-0 represent the RM field, which specifies the addressing mode or a register.

Stelf only cares when MOD is equal to '11': register-addressing mode. In this mode, the registers used in the instruction are specified by the REG and RM fields. The table below depicts all of the possible values:

REG Value	Reg if data size is 8 bits	Reg if data size is 16 bits	Reg if data size is 32 bits
000	al	ax	eax
001	cl	cx	ecx
010	dl	dx	edx
011	bl	bx	ebx
100	ah	sp	esp
101	ch	bp	ebp
110	dh	si	esi
111	bh	di	edi

A careful reader might ask, "Okay, REG and RM define the registers, but what about the direction bit?" Indeed, this bit determines whether REG will be the source or destination register, and here comes a crucial point: note that, depending on the opcode's bitD, it is possible to reverse the order of the registers in the ModR/M byte while keeping the same instruction.

Let's take a look:

 opcode        ModR/M
0000 0001   [11] [011] [000] = add eax, ebx (01 d8), REG is source
0000 0011   [11] [000] [011] = add eax, ebx (03 c3), REG is destination

Note that by inverting the direction bit, and also the order of the registers, the instruction is maintained, although its encoding changes. This is exactly how Stelf works: for each eligible instruction, the d bit is used to store the useful data and the registers in ModR/M are inverted (or not) to maintain the semantics of the instruction. So for each eligible instruction, a single bit is stored.

~ For 64-bit registers (like RAX-RDX, R8-R15...) Stelf also takes into account the REX prefix, but for simplicity the explanation will be omitted here ~.

Usage

Using stelf is quite simple, just a) first analyze how many bytes are available to be added to the target file and b) add this data.

a) Scan how many bytes are available (`-s`):

Use the -s option to scan a target binary:

$ ./stelf -s ~/clang-static/bin/clang-11
Scan summary:
380174 bytes available (3041399 inst patcheables, out of 16166560 (~18 %))

b) Add arbitrary data into the ELF file (`-w`):

Use the -w option to add a given file from stdin to the specified target file:

# Omitting output file, a 'out' file will be created:
$ ./stelf -w ~/clang-static/bin/clang-11 < my_input_file
Write summary:
Wrote 357336 bits (44667 bytes)

# Specifying the output file
$ ./stelf -w ~/clang-static/bin/clang-11 -o my_out_file < my_input_file
Write summary:
Wrote 357336 bits (44667 bytes)

c) Read the written data (`r`):

To read the written data, just use the -r flag. With parameter '0', all binary data is read, any other value reads the specified amount:

# Read all the data
$ ./stelf -r 0 out > my_read_data
$ wc -c my_read_data
380174 my_read_data

# Read only a given amount
$ ./stelf -r 44667 out > my_read_data
$ wc -c my_read_data
44667 my_read_data

How much data can I store?

Stelf's effectiveness is influenced by a number of variables. Stelf makes use of nine different instruction: MOV,ADD,SUB,SBB,CMP,AND, OR,XOR, and ADC, all of which must be in Reg/Reg format. Additionally, only a single bit is gained per patched instruction.

Assuming an average instruction size of 4 bytes, roughly 16% of the total instructions can be patched, or about 1/200th of the size of the entire .text section.

The following table includes some examples:

ELF file	ELF size / `.text` size	% of insn patcheable	bytes available
/bin/bash	1.2M / 716kB	15%	3496
/usr/bin/lua	175kB / 106kB	20%	764
/usr/bin/docker	57M / 22.25M	6%	40455
Bat 0.21	5.1M / 2.65M	15%	12404
Hyperfine 1.15.0	2.3M / 1.42M	15%	6678
clang-11-static	117M / 63.21M	18%	380174
firefox/libxul.so	149M / 93.33M	15%	445933
libnvoptix.so.510.47.03	168M / 13.69M	19%	83014

Is it really stealth?

It all depends. The data is inserted into the instruction itself, but the code path remains unchanged, and no data is visible to the naked eye in a hex editor, for example. A very careful eye, on the other hand, might notice that the instructions have been changed: a compiler would not generate two identical instructions in different ways, with different encoding, so this raises suspicions.

That said, I wouldn't put sensitive data in without some form of encryption for example, but Stelf still strikes me as quite useful for creating 'watermarks' e.g. in case you want to restrict an ELF as 'internal use' and things like.

Also, I confess that I was quite surprised with the amount of bytes available for each ELF analyzed, I expected much less!. Furthermore, the available space won't consume more disk, since it was always there anyway =).

Building

Stelf depends on libelf and Intel XED, so the build process might look like this:

# Clone Stelf
git clone https://github.com/Theldus/stelf

# Install libelf
sudo apt install libelf-dev  # Debian/Ubuntu...

# Install Intel XED
mkdir libxed/ && cd libxed/
git clone https://github.com/intelxed/mbuild.git mbuild
git clone https://github.com/intelxed/xed.git xed
cd xed/
./mfile.py install
export XED_KIT_PATH=$(readlink -f $PWD/kits/xed-install-base-*)

# Build Stelf
cd ../../
make

In case of incompatibility with different libraries versions, the following versions/commit were used:

xed: 4dc77137f651def2ece4ac0416607b215c18e6e4 External Release v2023.06.07
mbuild: 75cb46e6536758f1a3cdb3d6bd83a4a9fd0338bb External Release v2022.07.28
libelf: v0.181

Contributing

Stelf is always open to the community and willing to accept contributions, whether with issues, documentation, testing, new features, bugfixes, typos, and etc. Welcome aboard.

License and Authors

Stelf is licensed under MIT License. Written by Davidson Francis and (hopefully) other contributors.

Theldus / stelf

🕵️ stelf

How it works?

Usage

a) Scan how many bytes are available (`-s`):

b) Add arbitrary data into the ELF file (`-w`):

c) Read the written data (`r`):

How much data can I store?

Is it really stealth?

Building

Contributing

License and Authors

About

Languages

REG Value	Reg if data size is 8 bits	Reg if data size is 16 bits	Reg if data size is 32 bits
000	al	ax	eax
001	cl	cx	ecx
010	dl	dx	edx
011	bl	bx	ebx
100	ah	sp	esp
101	ch	bp	ebp
110	dh	si	esi
111	bh	di	edi

REG Value	Reg if data size is 8 bits	Reg if data size is 16 bits	Reg if data size is 32 bits
000	al	ax	eax
001	cl	cx	ecx
010	dl	dx	edx
011	bl	bx	ebx
100	ah	sp	esp
101	ch	bp	ebp
110	dh	si	esi
111	bh	di	edi

🕵️ stelf

How it works?

Usage

a) Scan how many bytes are available (-s):

b) Add arbitrary data into the ELF file (-w):

c) Read the written data (r):

How much data can I store?

Is it really stealth?

Building

Contributing

License and Authors

About

Languages

a) Scan how many bytes are available (`-s`):

b) Add arbitrary data into the ELF file (`-w`):

c) Read the written data (`r`):

REG Value	Reg if data size is 8 bits	Reg if data size is 16 bits	Reg if data size is 32 bits
000	al	ax	eax
001	cl	cx	ecx
010	dl	dx	edx
011	bl	bx	ebx
100	ah	sp	esp
101	ch	bp	ebp
110	dh	si	esi
111	bh	di	edi