zig ar: a drop-in llvm-ar replacement

Question

zig ar: a drop-in llvm-ar replacement

kubkon opened this issue 3 years ago · comments

Since we are putting a lot of effort into implementing our linker for all supported targets (#8726), we should also put some effort into adding our own implementation of a static archiver to replace llvm-ar. While in general llvm-ar is working well on the host platform when targeting the host platform, in cross-compilation settings things can get wonky when the archiver will produce native static archive headers for foreign file formats possibly tripping the linker upon trying to use it.

This is a great issue for any new contributor as it allows you to create an archiver as a completely standalone program (in its own repo like zld for instance) and then upstream it into Zig once it's ready.

Also, with this issue closed, we will be able to offer zig ar as a subcommand that does not rely on llvm-ar in any way.

This issue does not block 1.0.

Tom Read Cutting · Answer 1 · Fri Sep 24 2021 20:39:56 GMT+0800 (China Standard Time)

I'm interested in giving this a look-in! Where would be a good place to start in understanding the problem?

Jakub Konka · Answer 2 · Sat Sep 25 2021 19:32:10 GMT+0800 (China Standard Time)

If I were tackling this problem, I would most likely create a fresh repo with sole purpose of building a drop-in replacement for llvm-ar, much like zld is for lld. You can then build it out-of-tree which means you don't have to worry about passing Zig tests at this stage and you significantly cut down on build times. I would also focus on just one file format in the beginning say ELF, or Mach-O. The idea here is that if you were to pick any large C/C++ codebase (or whatnot) in the wild, you could pass the Zig archiver as a replacement for the default system one or llvm-ar, that is you'd tweak the CMake/Make invocation like so:

AR=zig-ar CC=... CXX=... cmake ../

or the same but with make. If the build process succeeds, then success!

Afterwards, you might wanna consider dropping the archiver as a direct replacement for llvm-ar in the Zig upstream either by calling directly to a precompiled binary or putting the sources in-tree (the latter is the end-goal actually). The relevant source where this should/could happen is in src/link.zig#L668:

pub fn linkAsArchive(base: *File, comp: *Compilation) !void {
    //...
    const llvm = @import("codegen/llvm/bindings.zig");
    const os_type = @import("target.zig").osToLLVM(base.options.target.os.tag);
    const bad = llvm.WriteArchive(full_out_path_z, object_files.items.ptr, object_files.items.len, os_type);
    if (bad) return error.UnableToWriteArchive;
    //...
}

Tom Read Cutting · Answer 3 · Sat Sep 25 2021 19:33:33 GMT+0800 (China Standard Time)

Will give that a crack!

Vladimir Vissoultchev · Answer 4 · Sat Sep 25 2021 19:46:47 GMT+0800 (China Standard Time)

JFYI, https://github.com/TinyCC/tinycc/blob/mob/tcctools.c has an extremely hacky impl for Elf support on Windows.

Tom Read Cutting · Answer 5 · Sun Sep 26 2021 00:31:16 GMT+0800 (China Standard Time)

Yeah - I've made a little bit of progress with this. Just still getting comfortable with zig so it's going to be a slightly idiosyncratic start. But it feels like a doable project.

https://github.com/moosichu/zar/

I just have a little program that can parse a very simple archive file and then prints out all the "filenames" of the files it contains.

I'm just building things up slowly step-by-step, with a focus on reading archives generated by llvm-ar to begin with.

The goal will be to make it a drop-in replacement, but will figure-out the order in which I do things as I go for now (still going to be very experimental early on).

I think what will probably end up happening is that I will experiment with parsing increasingly interesting archive files. And then I will loop back around and implement the command-line interface for the program. And then just incrementally work on each piece of functionality testing against the results of llvm-ar.

Will then probably make some kind of framework for testing those as well I think.

Jacob G-W · Answer 6 · Sun Sep 26 2021 00:33:05 GMT+0800 (China Standard Time)

You might want to look at https://github.com/SuperAuguste/zarc . It can parse ars, but it cannot create them. So maybe just using that and then adding features to create tars would be good?

Jakub Konka · Answer 7 · Sun Sep 26 2021 01:19:31 GMT+0800 (China Standard Time)

Yeah, the ultimate goal of zar should be generating static archives. Adding parsing logic is a good first step to figuring out how it works though. One thing to pay particular attention to is the differences in generated ar structure between linux and macos - I believe there is a difference in at least the header format but maybe more. Also, I've been reached out to by multiple people expressing interest in helping out so @moosichu are you fine taking charge on this one and potentially collaborating with others? If so, I'll send them your way (to your fresh repo, etc.).

Tom Read Cutting · Answer 8 · Sun Sep 26 2021 01:30:57 GMT+0800 (China Standard Time)

Yes - very happy to collaborate and I’ve started reading up and putting sources together on the differences in the formats for different platforms! I’ve linked to some of them in the repo - but will flesh that out properly tomorrow as well for others hoping to contribute as well.

Ayush · Answer 9 · Sun Sep 26 2021 12:07:08 GMT+0800 (China Standard Time)

I have also been doing my own independent attempt at this issue, here https://github.com/iddev5/zig-ar

My ar can create basic files so far, and it is compatible with llvm-ar and ranlib too.

If it works out, I can try merge it with zar as discussed above...

Jakub Konka · Answer 10 · Sun Sep 26 2021 15:31:59 GMT+0800 (China Standard Time)

Great progress! Since you have two repos it might make sense to split the focus a little. For example, @iddev5 could focus on linux and @moosichu on macos, etc., and afterwards merge both together as zar or otherwise. How's that for a plan?

Ayush · Answer 11 · Sun Sep 26 2021 15:37:05 GMT+0800 (China Standard Time)

This sounds great! Just to let you know that I am okay with either plans.
On my repo, I have already got reading and writing common-style archives done (without symbol table and string table, ofc)

ceckertz · Answer 12 · Sun Sep 26 2021 20:57:43 GMT+0800 (China Standard Time)

Hey I've been in contact @moosichu and @kubkon and wanted to join in. I'm on Linux, but happy to help out where I can!

Tom Read Cutting · Answer 13 · Sun Sep 26 2021 21:04:13 GMT+0800 (China Standard Time)

Sounds good - I think my repo needs to be fleshed a bit more before people can start making meaningful contributions (without stepping on each other's toes). I have made some good progress on argument processing though - and will keep chipping away at the problem today.

https://github.com/moosichu/zar/

But with the amount of interest expressed - I will try and fast-track to that point ASAP (hopefully this evening) including having issues that can be tackled by individuals (which I will be open for accepting PRs for). Especially as I will be working during the week so won't have anywhere near as much time to work on this then.

Jakub Konka · Answer 14 · Sun Sep 26 2021 21:06:39 GMT+0800 (China Standard Time)

Sounds good - I think my repo needs to be fleshed a bit more before people can start making meaningful contributions (without stepping on each other's toes). I have made some good progress on argument processing though - and will keep chipping away at the problem today.

https://github.com/moosichu/zar/

But with the amount of interest expressed - I will try and fast-track to that point ASAP (hopefully this evening) including having issues that can be tackled by individuals (which I will be open for accepting PRs for). Especially as I will be working during the week so won't have anywhere near as much time to work on this then.

Thanks for taking charge at organising this @moosichu, it's very much appreciated! If you need any assistance from me, please do let me know!

Tom Read Cutting · Answer 15 · Mon Sep 27 2021 16:32:54 GMT+0800 (China Standard Time)

The work of @iddev5 has been merged into the https://github.com/moosichu/zar/ repo. Thank you! @iddev5!

I need to properly read through the changes (and some cleaning-up needs to be done to make each of works consistent with each other). But it's a good step of progress for sure.

In terms of what I have done - I have the "print" and "display contents" ("p" and "t") operations working for both BSD & GNU-style files (although without support for symbol tables at this point).

Having looked at the problem - due to the slightly sutble ways the parsing of the different kinds of archives can overlap in functionality, it seems slightly better to structure the code around that & then slowly expand the functionality of which operations can be done on those files vs. completing everything for one kind of file and then adding another.

There's a couple of issues on the repo - mainly around cleaning up the merge & working on testing (something I haven't looked into at all). I've jotted my thoughts on how the latter could work if anyone is interested in that.

Progress has been fairly good so far overall I think! Lots still to do - and my time is going to be a bit more limited for the coming couple of weeks. But I will make sure to at least check the status of things every morning even if I can't work directly on the problem.

I did consider opening up the repo to others with commit access - but I think I might hold off on that as we can each probably work more quickly in our own repos (problems should be orthogonal enough), and I think it might be better if the project stabilises a bit first and things are a bit more coherent before then so that we are all on the same page before that happens. So I think we can see how things go with a PR-based model for now I think? If that doesn't work well I'm very open to reconsidering though.

My next focus for tomorrow morning (unless @iddev5 gets there first!) will be to look through the code that has been merged-in and to unify it into the rest of the code base a bit more concretely. But I won't be able to get on that until then, so if stuff is done on that in the meantime I will make sure to take that into consideration. Hopefully my comments (both in TODOs in the code and my write-up on the issue there help).

Otherwise @iddev5 feel free to just focus on expanding the functionality of what you already have (and if you create any PRs I will happily merge them). I can then sort out the unification side of things in the short term until things have settled.

Aakash Sen Sharma · Answer 16 · Sun Sep 10 2023 16:08:04 GMT+0800 (China Standard Time)

It seems there hasn't been much progress on the zig archiver from the looks of the repo and the last comment made on this thread.

I'd like to take over this task with some possible mentorship as I've never written an archiver.

Is that feasible for @andrewrk / @kubkon or any other individual knowledgeable in archivers?

Debjit Mandal · Answer 17 · Mon Sep 11 2023 01:17:07 GMT+0800 (China Standard Time)

I am also interested in this task. I'd love to collaborate with some mentorship.

Tom Read Cutting · Answer 18 · Tue Sep 12 2023 21:49:01 GMT+0800 (China Standard Time)

I've been actively working on it locally. Don't worry it's still going! Just slowly as I've not had a huge amount of free time recently.

Tom Read Cutting · Answer 19 · Wed Sep 13 2023 01:36:59 GMT+0800 (China Standard Time)

However! If you are keen/interested in joining the effort that would be more than welcome :) do let me know if you are interested and I will spend a couple of weeks getting it back into a contributor-friendly shape.