How to handle mach-o executables as well as elf executables

Question

How to handle mach-o executables as well as elf executables

Roger-Shepherd opened this issue 3 years ago · comments

This is here both as a request for some suggestions on how to address this issue, and to explain some of the other issues arsing from my work here (raised separately).

Apple products (Mac, iPhone, iPad, AppleTV) use mach-o format for object files and executables. The benchmark_size.py script is written to use Elf. I am looking at how (best) to handle them both.

In addition to the fact the file formats are different there are some other issues to address, including:

sections are named differently e.g. .text v __text
eld has the concept of section groups
elf and mach-o sections do not directly correspond
.text code r/x : __text
.data initialised r/w : __data
.rodata initialised data r : __cstring __const (TEXT initialised constant variables - non-relocatable) __const (DATA initialised constant variables - relocatable) __literal4 __literal8
.bss initialised r/w : __bss (uninitialised static variables e.g. static int i;) __common (uninitialised sported symbol definitions (e.g. int i) located in the global scope.

In terms of the script, I'd like to avoid having two completely different scripts for elf and mach-o. There are some issues to address though:

exactly what to measure for size for the mach-o executables
the section naming difference suggests that parameter parsing may need to be different for elf and mach-o
there is a single python library which can (apparently) handle elf and mach-o but to do so would mean changing working code

Roger Shepherd commented 3 years ago

Agreed

Jeremy Bennett · Answer 1 · Sat Mar 20 2021 02:20:19 GMT+0800 (China Standard Time)

We should use categories of section, not section names in ELF.

Supporting mach-o is good. We also ought to handle the Windows format PE (a derivative of COFF).

I presume there are analogous libraries for python, so we can use the same general approach.

Roger Shepherd · Answer 2 · Sun Mar 28 2021 05:03:10 GMT+0800 (China Standard Time)

This is how I propose to add handling of macho format files to benchmark_speed.py. I believe:

this minimises changes and hence minimises the chance of breaking something that works at the moment
is simple to expand to a further file format

Existing mechanisms

Categories of section

The current script is concerned with 4 categories of sections and, by default, associates each category with the name of a section:

executable code - `.text`
non-zero initialized writeable data - `.data`
read-only data - `.rodata`
zero initialised data - `.bss`

For each category, the user can override the default and explicitly set the names of sections in that category. This is done by using the following parameters followed by the name(s) of the section(s)

executable code - `--text`
non-zero initialized writeable data - `--data`
read-only data - `--rodata`
zero initialised data - `--bss`

Metric

The script reports a metric which is the sum of the sizes of a number of the categories of sections. by default the executable code (text) category. This can be overridden using the —metric parameter which takes the space separated list of categories to be included in the metric.

Handling omacho

Specifying the format

An optional parameter —format is added which selects the file format to be processed. The options are elf and macho, and the default is elf. [This to ensure that any existing command lines continue to process elf files without modification].

Categories of sections

Whereas elf files normally contain the four sections .text, .data,.rodata and .bss, macho files normally contain the five sections __text, __data,__cstring, __const and __bss. When processing macho file the default association between categories and the name of sections is:

executable code - `__text`
non-zero initialized writeable data - `__data`
read-only data - `__cstring` and `__const`
zero initialised data - `__bss`

Metric

This is the same as for elf files. The default metric is the executable code (text) category which can be overridden using the —metric parameter.

Paolo Savini · Answer 3 · Fri Apr 09 2021 16:56:55 GMT+0800 (China Standard Time)

I guess this has been solved by PR #132 @Roger-Shepherd ?

Roger Shepherd · Answer 4 · Fri Apr 09 2021 16:58:32 GMT+0800 (China Standard Time)

As far as I am concerned this is solved by PR#132. @jeremybennett might want to consider whether he's happy regarding his comments.

Paolo Savini · Answer 5 · Thu Apr 15 2021 23:18:50 GMT+0800 (China Standard Time)

@jeremybennett 's point is that the next step will be to use the type of section instead of the section name, for both elf and macho, and of course other formats too. This could be done by checking for each section whether it is readonly, writable, allocatable etc... through specific flags. I guess that macho has its way to do that?
Anyway this is for another task. We can close this issue for now and I'll create a new one (I'll check first if we haven't got one already) for this specific matter.