Expand/improve Bloaty's unit tests

Question

Expand/improve Bloaty's unit tests

haberman opened this issue 3 years ago · comments

Bloaty's test coverage is not great. Bloaty's existing tests use the output of a C compiler as their input, and then try to assert things about what the sizes should be. But this model is inherently fragile, because C compilers have a lot of latitude in what kind of object files and binaries they produce.

A better way to test Bloaty would be to use assembly as the input. This can give us much more control over the object file, which can help our unit tests be far more precise.

Bloaty is target-independent: a single Bloaty binary can analyze ELF or Mach-O. Ideally our tests can be cross-platform also, so you can test all object file formats from a single platform. Currently I am leaning towards using NASM as the assembler, because it is also target-independent:

# Produce a Mach-O object file
$ nasm -fmacho64 test.asm
# Produce an ELF object file
$ nasm -felf64 test.asm

This is very attractive, and nasm is also small and self-contained. On my machine it compiles in less than 4 seconds of real time:

$ time make -j72
real    0m3.366s
user    0m23.991s
sys     0m8.680s

Linking is trickier. I'm not aware of any target-independent linkers that can link both ELF and Mach-O from a single linker binary. I did find this portable port of the Apple linker which seems to be quite portable. The "gold" linker is ELF native, and should be portable, but it's tied up in the bigger and more complicated "binutils" package. Not sure the best solutions here.

Joshua Haberman · Answer 1 · Fri Jan 01 2021 11:45:50 GMT+0800 (China Standard Time)

After some more research, I'm not sure if NASM will cut it. There are some important features it doesn't appear to support.

Unwind Information: .eh_frame contains unwind information, and it appears this is generated somewhat automatically by the assembler from directives named .cfi_*, eg. . cfi_startproc , .cfi_endproc, etc. NASM appears to contain no support for this.
DWARF .loc directives, for emitting .debug_line.

Besides this, NASM format is less convenient to use because you cannot get the compiler to generate it manually (the compiler's output is meant for gas) and NASM format isn't as common or well known.

Perhaps it is better to use the toolchain assembler and linker, but check in the output files so that all formats can be tested no matter what platform one is developing on.

The main open question then is whether the assembler's output is deterministic enough for this to work.

Joshua Haberman · Answer 2 · Wed Jan 27 2021 06:35:44 GMT+0800 (China Standard Time)

What I really wish for is a text serialization of an arbitrary ELF or Mach-O binary. Then we could have unit tests that exercise every case precisely. If a bug arises, we could create a minimal case that precisely illustrates the bug.

If we had such a text serialization, we wouldn't need a platform linker to create our test binaries, because the text description would contain a full description of the output binary.

Bjørn Reese · Answer 3 · Sun Feb 14 2021 19:38:50 GMT+0800 (China Standard Time)

Related to this, I am working on a test suite for the short symbols demangler.

Bjørn Reese · Answer 4 · Sun Feb 21 2021 20:12:34 GMT+0800 (China Standard Time)

Now that the demangler had been replaced in #227, my test suite is no longer relevant.

Joshua Haberman · Answer 5 · Mon Feb 22 2021 02:39:48 GMT+0800 (China Standard Time)

Sorry about that! Perhaps it would be something you could contribute to ABSL.

Joshua Haberman · Answer 6 · Sun Apr 04 2021 01:21:28 GMT+0800 (China Standard Time)

One option I've considered is using yaml2obj from the LLVM project as the source format for Bloaty's test cases. This comes a lot closer than C to deterministically creating output binaries.

yaml2obj does not appear to be capable of losslessly round-tripping any object file:

$ cat test.c
int main() {}
$ gcc -c -o test.o test.c
$ obj2yaml test.o > test.yaml
$ yaml2obj test.yaml > test2.o
$ diff test.o test2.o
Binary files test.o and test2.o differ
$ readelf -WS test.o
There are 12 section headers, starting at offset 0x250:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        0000000000000000 000040 00000f 00  AX  0   0  1
  [ 2] .data             PROGBITS        0000000000000000 00004f 000000 00  WA  0   0  1
  [ 3] .bss              NOBITS          0000000000000000 00004f 000000 00  WA  0   0  1
  [ 4] .comment          PROGBITS        0000000000000000 00004f 000027 01  MS  0   0  1
  [ 5] .note.GNU-stack   PROGBITS        0000000000000000 000076 000000 00      0   0  1
  [ 6] .note.gnu.property NOTE            0000000000000000 000078 000020 00   A  0   0  8
  [ 7] .eh_frame         PROGBITS        0000000000000000 000098 000038 00   A  0   0  8
  [ 8] .rela.eh_frame    RELA            0000000000000000 0001d0 000018 18   I  9   7  8
  [ 9] .symtab           SYMTAB          0000000000000000 0000d0 0000f0 18     10   9  8
  [10] .strtab           STRTAB          0000000000000000 0001c0 00000d 00      0   0  1
  [11] .shstrtab         STRTAB          0000000000000000 0001e8 000067 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)
$ readelf -WS test2.o
There are 12 section headers, starting at offset 0x298:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        0000000000000000 000040 00000f 00  AX  0   0  1
  [ 2] .data             PROGBITS        0000000000000000 00004f 000000 00  WA  0   0  1
  [ 3] .bss              NOBITS          0000000000000000 00004f 000000 00  WA  0   0  1
  [ 4] .comment          PROGBITS        0000000000000000 00004f 000027 01  MS  0   0  1
  [ 5] .note.GNU-stack   PROGBITS        0000000000000000 000076 000000 00      0   0  1
  [ 6] .note.gnu.property NOTE            0000000000000000 000078 000020 00   A  0   0  8
  [ 7] .eh_frame         PROGBITS        0000000000000000 000098 000038 00   A  0   0  8
  [ 8] .rela.eh_frame    RELA            0000000000000000 0000d0 000018 18   I  9   7  8
  [ 9] .symtab           SYMTAB          0000000000000000 0000e8 0000f0 18     10   9  8
  [10] .strtab           STRTAB          0000000000000000 0001d8 000054 00      0   0  1
  [11] .shstrtab         STRTAB          0000000000000000 00022c 000067 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)

Notice the size of .strtab changed. This is enough that it would throw off any precise assertions about the output.

But full round-tripping is not what we need, we just need full reproducibility. If the same YAML will always produce a byte-for-byte identical object file, that would be enough. If yaml2obj is stable over time, this could be a good option.

Ideally yaml2obj would give textual representations of all elements of the object file. But it appears that the contents of some sections are just tunneled through:

$ head -80 test.yaml 
--- !ELF
FileHeader:
  Class:           ELFCLASS64
  Data:            ELFDATA2LSB
  Type:            ET_REL
  Machine:         EM_X86_64
Sections:
  - Name:            .text
    Type:            SHT_PROGBITS
    Flags:           [ SHF_ALLOC, SHF_EXECINSTR ]
    AddressAlign:    0x0000000000000001
    Content:         F30F1EFA554889E5B8000000005DC3
  - Name:            .data
    Type:            SHT_PROGBITS
    Flags:           [ SHF_WRITE, SHF_ALLOC ]
    AddressAlign:    0x0000000000000001
  - Name:            .bss
    Type:            SHT_NOBITS
    Flags:           [ SHF_WRITE, SHF_ALLOC ]
    AddressAlign:    0x0000000000000001
  - Name:            .debug_info
    Type:            SHT_PROGBITS
    AddressAlign:    0x0000000000000001
    Content:         4F0000000400000000000801000000000C000000000000000000000000000000000F000000000000000000000002000000000102054B00000000000000000000000F00000000000000019C030405696E740000
  - Name:            .rela.debug_info
    Type:            SHT_RELA
    Flags:           [ SHF_INFO_LINK ]
    Link:            .symtab
    AddressAlign:    0x0000000000000008
    Info:            .debug_info
    Relocations:
      - Offset:          0x0000000000000006
        Symbol:          .debug_abbrev
        Type:            R_X86_64_32
      - Offset:          0x000000000000000C
        Symbol:          .debug_str
        Type:            R_X86_64_32
        Addend:          17
      - Offset:          0x0000000000000011
        Symbol:          .debug_str
        Type:            R_X86_64_32
        Addend:          5
      - Offset:          0x0000000000000015
        Symbol:          .debug_str
        Type:            R_X86_64_32
      - Offset:          0x0000000000000019
        Symbol:          .text
        Type:            R_X86_64_64
      - Offset:          0x0000000000000029
        Symbol:          .debug_line
        Type:            R_X86_64_32
      - Offset:          0x000000000000002E
        Symbol:          .debug_str
        Type:            R_X86_64_32
        Addend:          12
      - Offset:          0x0000000000000039
        Symbol:          .text
        Type:            R_X86_64_64
  - Name:            .debug_abbrev
    Type:            SHT_PROGBITS
    AddressAlign:    0x0000000000000001
    Content:         011101250E130B030E1B0E1101120710170000022E003F19030E3A0B3B0B390B491311011207401897421900000324000B0B3E0B0308000000
  - Name:            .debug_aranges
    Type:            SHT_PROGBITS
    AddressAlign:    0x0000000000000001
    Content:         2C00000002000000000008000000000000000000000000000F0000000000000000000000000000000000000000000000
  - Name:            .rela.debug_aranges
    Type:            SHT_RELA
    Flags:           [ SHF_INFO_LINK ]
    Link:            .symtab
    AddressAlign:    0x0000000000000008
    Info:            .debug_aranges
    Relocations:
      - Offset:          0x0000000000000006
        Symbol:          .debug_info
        Type:            R_X86_64_32
      - Offset:          0x0000000000000010
        Symbol:          .text
        Type:            R_X86_64_64
  - Name:            .debug_line

Note that we get structured information for the relocations, but the contents of .text, .debug_info, .debug_aranges, etc. are just tunneled through as Content, instead of showing assembly code and DWARF DIEs, like you would get from objdump.

Mark Jansen · Answer 7 · Sun Apr 04 2021 02:57:06 GMT+0800 (China Standard Time)

Considering you want to make the testcases more robust, I would like to suggest this approach:

Create a new repository for test files.
This repository contains all test files, pre-compiled (ready to use).
It would probably be a good idea, to document for each test file how it was created (compiler + source code used).

There are a few reasons for this:

Git is not very good with big binary files, and having all of them in a separate repo gives more freedom to move them around, rewrite history (to completely remove files that are no longer used), or even switch to git lfs (without having to much around with the main repo)
Having all files pre-compiled makes the reproduction of unittests less depending on what compiler / platform the one running the tests is using
Having all files pre-compiled also means that binaries that are harder (or even impossible) to script can be tested, for example a binary obfuscated by a (commercial) packer. (Not sure what the use of that would be, but it was the first example that I could come up with).

Mark Jansen · Answer 8 · Sun Apr 04 2021 07:41:11 GMT+0800 (China Standard Time)

Another thing to consider is how to organize testcases, one approach would be:

ELF
- x86
- - clang 9.0
- - - test1.bin
- - - test2.so
- - gcc xx
- x64
- - clang 9.0
- - gcc xx
PE
- x86
- - msvc 2015
- - - test1.bin
- - - test2.dll
- - msvc 2019

Another option would be:

ELF
- clang 9.0
- - x86
......

Joshua Haberman · Answer 9 · Sun Apr 04 2021 08:56:18 GMT+0800 (China Standard Time)

Git is not very good with big binary files

That's true, but I don't think big files make for good test cases either. I think it would be best if test cases were very generally small and focused (<10Ki, ideally even under 1Ki). With the possible exception of overflow checking, I feel that pretty much any bug or feature in Bloaty should be testable with a small payload.

I'm not fundamentally opposed to creating a separate repo for test data, but it does involve some administrative overhead (I'd have to figure out what Google process to go through for adding another repo to an existing project, get approvals, etc).

Having all files pre-compiled makes the reproduction of unittests less depending on what compiler / platform the one running the tests is using

I definitely agree that we want the tests to be independent of the compiler. That's why I was suggesting yaml2obj which doesn't use a C compiler at all.

Having all files pre-compiled also means that binaries that are harder (or even impossible) to script can be tested, for example a binary obfuscated by a (commercial) packer.

I think this is another benefit of yaml2obj, it looks like it is capable of nearly perfectly recreating any object file, regardless of how it was created.

Another thing to consider is how to organize testcases, one approach would be:

I like your proposed layout but I don't want it to specify the compiler. Whenever compilers differ in any meaningful way, we should be able to test the difference itself, at the object file level, and not make it a compiler-specific test.

I imagine something like:

ELF
- x86
  - simple_obj.o.test
  - simple_so.so.test
  - simple_bin.test
  - ...
- x64
  - simple_obj.o.test
  - simple_so.so.test
  - simple_bin.test
PE
- x86
  - simple_obj.o.test
  - simple_so.dll.test
  - ...

I was imagining that each .test file would look something like this:

--- !ELF                                                                   
FileHeader:                                                                
  Class:           ELFCLASS64                                              
  Data:            ELFDATA2LSB
  Type:            ET_REL
  Machine:         EM_X86_64                                               
Sections:                                                                  
  - Name:            .text                                                 
    Type:            SHT_PROGBITS                                          
    Flags:           [ SHF_ALLOC, SHF_EXECINSTR ]
    AddressAlign:    0x1        
    Content:         554889E5B8000000005DC3
  - Name:            .data                                                 
    Type:            SHT_PROGBITS                                          
    Flags:           [ SHF_WRITE, SHF_ALLOC ]
    AddressAlign:    0x1                                                   
  - Name:            .bss                                                  
    Type:            SHT_NOBITS                                            
    Flags:           [ SHF_WRITE, SHF_ALLOC ]
    AddressAlign:    0x1                                                   
  - Name:            .comment                                              
    Type:            SHT_PROGBITS                                          
    Flags:           [ SHF_MERGE, SHF_STRINGS ]
    AddressAlign:    0x1                                                   
    EntSize:         0x1                                                   
    Content:         004743433A202844656269616E2031302E322E312D362B6275696C6431292031302E322E3120323032313031313000
  - Name:            .note.GNU-stack                                                                                                                   
    Type:            SHT_PROGBITS
    AddressAlign:    0x1
  - Name:            .eh_frame
    Type:            SHT_PROGBITS
    Flags:           [ SHF_ALLOC ]
    AddressAlign:    0x8
    Content:         1400000000000000017A5200017810011B0C0708900100001C0000001C000000000000000B00000000410E108602430D06460C0708000000
...

$ bloaty -d segments,sections
FILE MAP:
000-040          64             [ELF Headers]   [ELF Headers]
040-04b          11             Section [AX]    .text
04b-080          53             Section []      .comment
080-0b8          56             Section [A]     .eh_frame
0b8-0f8          64             Section []      .shstrtab
0f8-138          64             [ELF Headers]   [ELF Headers]
138-178          64             [ELF Headers]   .text
178-1b8          64             [ELF Headers]   .comment
1b8-1f8          64             [ELF Headers]   [ELF Headers]
1f8-238          64             [ELF Headers]   .comment
238-278          64             [ELF Headers]   [ELF Headers]
278-2b8          64             [ELF Headers]   .eh_frame
2b8-2f8          64             [ELF Headers]   .shstrtab

VM MAP:
00000000000-10000000000  1099511627776          [-- Nothing mapped --]
10000000000-1000000000b          11             Section [AX]    .text
1000000000b-60000000000  5497558138869          [-- Nothing mapped --]
60000000000-60000000038          56             Section [A]     .eh_frame

The first part is just the obj2yaml output, and the testing harness can run yaml2obj to recreate the object file. The second part is the bloaty command, followed by the expected memory map (in both file and VM space) from that command.

This seems like the right level of abstraction to be testing, because when it comes down to it, Bloaty is an object file -> memory map transformation with some convenient reporting built on top.

Does that make sense? The more I write it out the more I like the direction. The main questions/risks I see are:

is the obj2yaml format stable over time? I think they are adding new capabilities which might change the obj2yaml output, but the important question is if the yaml2obj direction is stable over time.
Is it going to be a big pain for Bloaty's tests to have a dependency on LLVM yaml2obj?
Can we reasonably reduce the object files we care about to a small enough file that the YAML output won't be too long?

Mark Jansen · Answer 10 · Mon Apr 05 2021 00:57:29 GMT+0800 (China Standard Time)

Git is not very good with big binary files

That's true, but I don't think big files make for good test cases either. I think it would be best if test cases were very generally small and focused (<10Ki, ideally even under 1Ki). With the possible exception of overflow checking, I feel that pretty much any bug or feature in Bloaty should be testable with a small payload.

For example PDB files are quite big (they include a lot if information, including FPO and local variable locations etc).

I'm not fundamentally opposed to creating a separate repo for test data, but it does involve some administrative overhead (I'd have to figure out what Google process to go through for adding another repo to an existing project, get approvals, etc).

Having all files pre-compiled makes the reproduction of unittests less depending on what compiler / platform the one running the tests is using

It was an idea mainly aimed at keeping the repository size for bloaty down.
If there won't be a lot of big binary files, there probably won't be a problem.

I definitely agree that we want the tests to be independent of the compiler. That's why I was suggesting yaml2obj which doesn't use a C compiler at all.

Having all files pre-compiled also means that binaries that are harder (or even impossible) to script can be tested, for example a binary obfuscated by a (commercial) packer.

I think this is another benefit of yaml2obj, it looks like it is capable of nearly perfectly recreating any object file, regardless of how it was created.

Indeed, yaml2obj could work here as well, but then is the question of distributing yaml2obj as binary with the test suite for all platforms, or including the source in this repo so it can be built.
(Or requiring the user to provide a version of yaml2obj)

Another thing to consider is how to organize testcases, one approach would be:

I like your proposed layout but I don't want it to specify the compiler. Whenever compilers differ in any meaningful way, we should be able to test the difference itself, at the object file level, and not make it a compiler-specific test.

I imagine something like:
* ELF
  
  * x86
    
    * simple_obj.o.test
    * simple_so.so.test
    * simple_bin.test
    * ...
  * x64
    
    * simple_obj.o.test
    * simple_so.so.test
    * simple_bin.test

* PE
  
  * x86
    
    * simple_obj.o.test
    * simple_so.dll.test
    * ...

That would decrease the indentation level indeed, but I would advice to at least include somewhere the specific compiler + revision used to create the binary, in case people want to reproduce something.

I was imagining that each .test file would look something like this:

--- !ELF                                                                   
FileHeader:                                                                
  Class:           ELFCLASS64                                              
  Data:            ELFDATA2LSB
  Type:            ET_REL
  Machine:         EM_X86_64                                               
Sections:                                                                  
  - Name:            .text                                                 
    Type:            SHT_PROGBITS                                          
    Flags:           [ SHF_ALLOC, SHF_EXECINSTR ]
    AddressAlign:    0x1        
    Content:         554889E5B8000000005DC3
  - Name:            .data                                                 
    Type:            SHT_PROGBITS                                          
    Flags:           [ SHF_WRITE, SHF_ALLOC ]
    AddressAlign:    0x1                                                   
  - Name:            .bss                                                  
    Type:            SHT_NOBITS                                            
    Flags:           [ SHF_WRITE, SHF_ALLOC ]
    AddressAlign:    0x1                                                   
  - Name:            .comment                                              
    Type:            SHT_PROGBITS                                          
    Flags:           [ SHF_MERGE, SHF_STRINGS ]
    AddressAlign:    0x1                                                   
    EntSize:         0x1                                                   
    Content:         004743433A202844656269616E2031302E322E312D362B6275696C6431292031302E322E3120323032313031313000
  - Name:            .note.GNU-stack                                                                                                                   
    Type:            SHT_PROGBITS
    AddressAlign:    0x1
  - Name:            .eh_frame
    Type:            SHT_PROGBITS
    Flags:           [ SHF_ALLOC ]
    AddressAlign:    0x8
    Content:         1400000000000000017A5200017810011B0C0708900100001C0000001C000000000000000B00000000410E108602430D06460C0708000000
...

$ bloaty -d segments,sections
FILE MAP:
000-040          64             [ELF Headers]   [ELF Headers]
040-04b          11             Section [AX]    .text
04b-080          53             Section []      .comment
080-0b8          56             Section [A]     .eh_frame
0b8-0f8          64             Section []      .shstrtab
0f8-138          64             [ELF Headers]   [ELF Headers]
138-178          64             [ELF Headers]   .text
178-1b8          64             [ELF Headers]   .comment
1b8-1f8          64             [ELF Headers]   [ELF Headers]
1f8-238          64             [ELF Headers]   .comment
238-278          64             [ELF Headers]   [ELF Headers]
278-2b8          64             [ELF Headers]   .eh_frame
2b8-2f8          64             [ELF Headers]   .shstrtab

VM MAP:
00000000000-10000000000  1099511627776          [-- Nothing mapped --]
10000000000-1000000000b          11             Section [AX]    .text
1000000000b-60000000000  5497558138869          [-- Nothing mapped --]
60000000000-60000000038          56             Section [A]     .eh_frame

The first part is just the obj2yaml output, and the testing harness can run yaml2obj to recreate the object file. The second part is the bloaty command, followed by the expected memory map (in both file and VM space) from that command.

This seems like the right level of abstraction to be testing, because when it comes down to it, Bloaty is an object file -> memory map transformation with some convenient reporting built on top.

This looks a lot like the tests being done in llvm (I believe with lit?):
https://www.llvm.org/docs/CommandGuide/lit.html
and for example https://github.com/llvm/llvm-project/blob/62ec4ac90738a5f2d209ed28c822223e58aaaeb7/llvm/test/tools/llvm-readobj/COFF/file-headers.test

Does that make sense? The more I write it out the more I like the direction. The main questions/risks I see are:
1. is the `obj2yaml` format stable over time? I think they are adding new capabilities which might change the `obj2yaml` output, but the important question is if the `yaml2obj` direction is stable over time.

I am not sure about this, it might make sense to either include a binary or the sources of obj2yaml to fix it in place.

2. Is it going to be a big pain for Bloaty's tests to have a dependency on LLVM `yaml2obj`?

Not if it is in some way made available, otherwise I am not sure.

3. Can we reasonably reduce the object files we care about to a small enough file that the YAML output won't be too long?

yaml files can be edited by hand to reduce size, so all information not relevant for the test can be cut out.

Mark Jansen · Answer 11 · Mon Apr 05 2021 04:28:35 GMT+0800 (China Standard Time)

https://github.com/llvm/llvm-project/tree/62ec4ac90738a5f2d209ed28c822223e58aaaeb7/llvm/test/tools/llvm-readobj/COFF
They seem to do both, test files that run yaml2obj (and much more) and include some binaries.

But they seem to include the expected result before the yaml of the binary, which makes sense since this is the part you most likely want to read first:
https://github.com/llvm/llvm-project/blob/main/lldb/test/Shell/ObjectFile/PECOFF/export-dllfunc.yaml

Joshua Haberman · Answer 12 · Wed Jul 28 2021 02:50:38 GMT+0800 (China Standard Time)

I've been working on this some. I'm liking how the unit tests are looking: https://github.com/haberman/bloaty/tree/better-tests/tests/elf/sections

Mark Jansen · Answer 13 · Wed Jul 28 2021 20:32:46 GMT+0800 (China Standard Time)

I've been working on this some. I'm liking how the unit tests are looking: https://github.com/haberman/bloaty/tree/better-tests/tests/elf/sections

You are relying on the system-provided yaml2obj for your tests?
Should there be a check for a minimum version expected/known to be working?

Joshua Haberman · Answer 14 · Mon Aug 02 2021 04:20:17 GMT+0800 (China Standard Time)

You are relying on the system-provided yaml2obj for your tests?
Should there be a check for a minimum version expected/known to be working?

Good question. There is also the question of how to make this work with CI.

@compnerd Do you have an idea of how yaml2obj (a tool from LLVM) could reasonably be made available to the CI tests? Unfortunately LLVM is a heavy dependency that takes a long time to build.

For Linux we could consider using this tool from a docker image, but that doesn't help for MacOS/Windows.

Joshua Haberman · Answer 15 · Mon Aug 02 2021 07:45:55 GMT+0800 (China Standard Time)

Maybe we should make the yaml2obj tests only run on Linux for CI. Bloaty is cross-platform, but it seems ok if we only run the full battery of tests on Linux.

If we restrict ourselves to Linux-only, then perhaps our CI tests can use yaml2obj from a Docker image of LLVM.

Mark Jansen · Answer 16 · Mon Aug 02 2021 18:40:13 GMT+0800 (China Standard Time)

You are relying on the system-provided yaml2obj for your tests?
Should there be a check for a minimum version expected/known to be working?

Good question. There is also the question of how to make this work with CI.

@compnerd Do you have an idea of how yaml2obj (a tool from LLVM) could reasonably be made available to the CI tests? Unfortunately LLVM is a heavy dependency that takes a long time to build.

For Linux we could consider using this tool from a docker image, but that doesn't help for MacOS/Windows.

LLVM is available in chocolatey, so maybe that can be used for yaml2obj, like we do in ReactOS: https://github.com/reactos/reactos/blob/master/.github/workflows/build.yml#L90
Otherwise, just downloading a binary (or a zip with the binary in it) also works perfectly fine on github builders:
https://github.com/reactos/reactos/blob/master/.github/workflows/build.yml#L93

Saleem Abdulrasool · Answer 17 · Mon Aug 02 2021 23:37:47 GMT+0800 (China Standard Time)

The problem is that yaml2obj is not "for use" (that is, it is an internal testing tool not meant to be shipped, though I do realize that the reality doesn't match that and many project do use it for testing explicitly because of its utility) so there is no guaranteed way to get it.

I agree with @learn-more that if the packaging in chocolatey has yaml2obj included, we should opt to use that. For macOS we could try to see if brew has a package (or does it only do source builds?), and we could do that for Linux as well.

Mark Jansen · Answer 18 · Mon Aug 02 2021 23:50:02 GMT+0800 (China Standard Time)

Does not appear to be included:

Directory of C:\Program Files\LLVM\bin

02-Aug-21  17:47    <DIR>          .
02-Aug-21  17:47    <DIR>          ..
23-Oct-18  02:10            21,032 api-ms-win-core-console-l1-1-0.dll
23-Oct-18  02:10            21,024 api-ms-win-core-console-l1-2-0.dll
23-Oct-18  02:10            20,512 api-ms-win-core-datetime-l1-1-0.dll
23-Oct-18  02:10            20,544 api-ms-win-core-debug-l1-1-0.dll
23-Oct-18  02:10            20,520 api-ms-win-core-errorhandling-l1-1-0.dll
23-Oct-18  02:10            24,104 api-ms-win-core-file-l1-1-0.dll
23-Oct-18  02:10            20,520 api-ms-win-core-file-l1-2-0.dll
23-Oct-18  02:10            20,520 api-ms-win-core-file-l2-1-0.dll
23-Oct-18  02:10            20,520 api-ms-win-core-handle-l1-1-0.dll
23-Oct-18  02:10            21,032 api-ms-win-core-heap-l1-1-0.dll
23-Oct-18  02:10            20,520 api-ms-win-core-interlocked-l1-1-0.dll
23-Oct-18  02:10            21,568 api-ms-win-core-libraryloader-l1-1-0.dll
23-Oct-18  02:10            23,080 api-ms-win-core-localization-l1-2-0.dll
23-Oct-18  02:10            21,032 api-ms-win-core-memory-l1-1-0.dll
23-Oct-18  02:10            20,520 api-ms-win-core-namedpipe-l1-1-0.dll
23-Oct-18  02:10            21,544 api-ms-win-core-processenvironment-l1-1-0.dll23-Oct-18  02:10            22,560 api-ms-win-core-processthreads-l1-1-0.dll
23-Oct-18  02:10            21,056 api-ms-win-core-processthreads-l1-1-1.dll
23-Oct-18  02:10            20,008 api-ms-win-core-profile-l1-1-0.dll
23-Oct-18  02:10            21,056 api-ms-win-core-rtlsupport-l1-1-0.dll
23-Oct-18  02:10            20,520 api-ms-win-core-string-l1-1-0.dll
23-Oct-18  02:10            22,568 api-ms-win-core-synch-l1-1-0.dll
23-Oct-18  02:10            21,032 api-ms-win-core-synch-l1-2-0.dll
23-Oct-18  02:10            21,544 api-ms-win-core-sysinfo-l1-1-0.dll
23-Oct-18  02:10            21,032 api-ms-win-core-timezone-l1-1-0.dll
23-Oct-18  02:10            20,520 api-ms-win-core-util-l1-1-0.dll
23-Oct-18  02:10            21,544 api-ms-win-crt-conio-l1-1-0.dll
23-Oct-18  02:10            24,616 api-ms-win-crt-convert-l1-1-0.dll
23-Oct-18  02:10            21,032 api-ms-win-crt-environment-l1-1-0.dll
23-Oct-18  02:10            22,568 api-ms-win-crt-filesystem-l1-1-0.dll
23-Oct-18  02:10            21,544 api-ms-win-crt-heap-l1-1-0.dll
23-Oct-18  02:10            21,032 api-ms-win-crt-locale-l1-1-0.dll
23-Oct-18  02:10            29,528 api-ms-win-crt-math-l1-1-0.dll
23-Oct-18  02:10            28,736 api-ms-win-crt-multibyte-l1-1-0.dll
23-Oct-18  02:10            73,048 api-ms-win-crt-private-l1-1-0.dll
23-Oct-18  02:10            21,568 api-ms-win-crt-process-l1-1-0.dll
23-Oct-18  02:10            25,128 api-ms-win-crt-runtime-l1-1-0.dll
23-Oct-18  02:10            26,664 api-ms-win-crt-stdio-l1-1-0.dll
23-Oct-18  02:10            26,664 api-ms-win-crt-string-l1-1-0.dll
23-Oct-18  02:10            23,080 api-ms-win-crt-time-l1-1-0.dll
23-Oct-18  02:10            21,032 api-ms-win-crt-utility-l1-1-0.dll
15-Apr-21  11:57        98,132,992 clang++.exe
15-Apr-21  11:34         1,689,600 clang-apply-replacements.exe
15-Apr-21  11:35        22,002,688 clang-change-namespace.exe
15-Apr-21  11:34        82,247,680 clang-check.exe
15-Apr-21  11:57        98,132,992 clang-cl.exe
15-Apr-21  11:57        98,132,992 clang-cpp.exe
15-Apr-21  11:36        21,480,960 clang-doc.exe
15-Apr-21  11:34        21,071,872 clang-extdef-mapping.exe
15-Apr-21  11:34         1,706,496 clang-format.exe
15-Apr-21  11:35        21,667,328 clang-include-fixer.exe
15-Apr-21  11:36        21,992,448 clang-move.exe
15-Apr-21  11:34         3,089,408 clang-offload-bundler.exe
15-Apr-21  11:34         1,912,832 clang-offload-wrapper.exe
15-Apr-21  11:35        22,705,152 clang-query.exe
15-Apr-21  11:34        22,229,504 clang-refactor.exe
15-Apr-21  11:34        21,509,120 clang-rename.exe
15-Apr-21  11:34        21,441,536 clang-reorder-fields.exe
15-Apr-21  11:34        21,176,832 clang-scan-deps.exe
15-Apr-21  11:35        48,737,280 clang-tidy.exe
15-Apr-21  11:34        98,132,992 clang.exe
15-Apr-21  11:36        31,162,880 clangd.exe
25-Jan-21  14:41           309,128 concrt140.dll
15-Apr-21  11:34         7,026,688 diagtool.exe
15-Apr-21  11:35        21,529,088 find-all-symbols.exe
06-Apr-21  18:38            21,461 git-clang-format
06-Apr-21  18:38             9,980 hmaptool
15-Apr-21  11:57        70,535,680 ld.lld.exe
15-Apr-21  11:57        70,535,680 ld64.lld.darwinnew.exe
15-Apr-21  11:57        70,535,680 ld64.lld.exe
15-Apr-21  11:36        78,288,384 libclang.dll
15-Apr-21  11:57           690,176 libiomp5md.dll
15-Apr-21  11:38        99,865,088 liblldb.dll
15-Apr-21  11:26           690,176 libomp.dll
15-Apr-21  11:57        70,535,680 lld-link.exe
15-Apr-21  11:36        70,535,680 lld.exe
15-Apr-21  11:27           136,704 lldb-argdumper.exe
15-Apr-21  11:38        38,737,408 lldb-instr.exe
15-Apr-21  11:38        17,275,904 lldb-server.exe
15-Apr-21  11:38           380,416 lldb-vscode.exe
15-Apr-21  11:38           239,104 lldb.exe
15-Apr-21  11:32        20,884,480 llvm-ar.exe
15-Apr-21  11:38        69,257,216 LLVM-C.dll
15-Apr-21  11:38         3,783,168 llvm-cov.exe
15-Apr-21  11:27           369,664 llvm-cxxfilt.exe
15-Apr-21  11:57        20,884,480 llvm-lib.exe
15-Apr-21  11:38        21,228,544 llvm-nm.exe
15-Apr-21  11:38         3,599,360 llvm-objcopy.exe
15-Apr-21  11:38        20,591,616 llvm-objdump.exe
15-Apr-21  11:31         1,531,392 llvm-profdata.exe
15-Apr-21  11:57        20,884,480 llvm-ranlib.exe
15-Apr-21  11:38           318,464 llvm-rc.exe
15-Apr-21  11:38         3,116,544 llvm-size.exe
15-Apr-21  11:38           262,656 llvm-strings.exe
15-Apr-21  11:57         3,599,360 llvm-strip.exe
15-Apr-21  11:38         4,183,552 llvm-symbolizer.exe
15-Apr-21  11:33        67,101,696 LTO.dll
15-Apr-21  11:34        21,158,400 modularize.exe
25-Jan-21  14:41           585,096 msvcp140.dll
15-Apr-21  11:35        21,083,136 pp-trace.exe
15-Apr-21  11:27           243,200 Remarks.dll
06-Apr-21  18:38            58,072 scan-build
06-Apr-21  18:38                23 scan-build.bat
06-Apr-21  18:38             4,702 scan-view
23-Oct-18  02:10         1,026,088 ucrtbase.dll
25-Jan-21  14:41            94,088 vcruntime140.dll
25-Jan-21  14:41            36,744 vcruntime140_1.dll
15-Apr-21  11:57        70,535,680 wasm-ld.exe
             108 File(s)  1,775,613,862 bytes

Saleem Abdulrasool · Answer 19 · Tue Aug 03 2021 00:30:03 GMT+0800 (China Standard Time)

Okay, well, if we are careful, we should be able to use MinGW then. IIRC, MinGW bundles it in their installation. GHA runners have an installation of MinGW as well, so we should be able to get away with it on Windows. Im still not sure about what to do for Linux and macOS.

Depending on how clever we want to be, the prebuilt binaries on GH for LLVM do have the static libraries and headers, we could just grab the single source file from GH and build it manually ... .

Mark Jansen · Answer 20 · Tue Aug 03 2021 00:43:00 GMT+0800 (China Standard Time)

Okay, well, if we are careful, we should be able to use MinGW then. IIRC, MinGW bundles it in their installation. GHA runners have an installation of MinGW as well, so we should be able to get away with it on Windows. Im still not sure about what to do for Linux and macOS.

Depending on how clever we want to be, the prebuilt binaries on GH for LLVM do have the static libraries and headers, we could just grab the single source file from GH and build it manually ... .

I would just suggest to grab an yaml2obj.exe, zip it up and host it somewhere,
letting the builder unpack it on demand.

Joshua Haberman · Answer 21 · Tue Aug 03 2021 00:54:44 GMT+0800 (China Standard Time)

I would just suggest to grab an yaml2obj.exe, zip it up and host it somewhere,
letting the builder unpack it on demand.

I might lean slightly this way too, especially since I think we'll be depending on a very new version of yaml2obj (maybe even a prerelease version).

When I was writing my tests, I needed a prerelease version to get enough support for DWARF.

Saleem Abdulrasool · Answer 22 · Tue Aug 03 2021 02:37:58 GMT+0800 (China Standard Time)

Yeah, the DWARF support is pretty new (IIRC, a recent GSoC project), so wouldn't be part of anything but LLVM 13. If we host the binaries, that would certainly work. I think that we may want to consider using FileCheck as well and we can use lit as the driver for the tests (though that we can acquire via pip install lit).

Saleem Abdulrasool · Answer 23 · Tue Aug 03 2021 04:24:22 GMT+0800 (China Standard Time)

#259 is for your perusal. It converts the testsuite to lit, though I need to test it with a recent build of obj2yaml (I had a very old - ~1 year?) build that was lying around on my machine that largely validated that things are wired up properly, but need to make sure that everything works. I've tried to add some docs to explain how someone else may run the tests, which I think was missing with the new tests as they are not yet integrated into the test targets.

Joshua Haberman · Answer 24 · Tue Aug 10 2021 06:05:59 GMT+0800 (China Standard Time)

Between #259, #260, and others, we now have a fast, hermetic set of tests that work on both CI and locally, and run successfully on all platforms!

With that, I consider this bug fixed, even though there are still lots of tests to write.