add an assembler to the toolchain

Question

add an assembler to the toolchain

andrewrk opened this issue a year ago · comments

Prerequisite for #16270.

Builds of Zig that do not link against LLVM and Clang still need to be able to compile assembly files.

The existing commands already work, and they already support compiling assembly files: zig build-obj, zig build-exe, zig build-lib. The logic needs to be modified to use Zig's own assembler rather than invoking Clang as a subprocess.

For the x86 family specifically, let us jump on the intel syntax train, embracing that as the better syntax. However, we also want to be able to compile the multitude of existing files from the wild without any changes. So it will need to support AT&T syntax as well.

I suggest we start by borrowing LLVM's CPU instruction data via another tool in the tools/ directory. At some point the backends should start using this data as well instead of using an ad-hoc parser, but that will be a follow-up issue.

In order to close this issue, Zig must use its own assembler for all input files, never calling the clang binary for assembly.

Alex Rønne Petersen · Answer 1 · Fri Dec 13 2024 12:30:15 GMT+0800 (China Standard Time)

Should this include a C preprocessor? A lot of assembly files in the wild (.S) are written with the assumption that they'll be run through one.

Andrew Kelley · Answer 2 · Sat Dec 14 2024 05:19:26 GMT+0800 (China Standard Time)

Yes I think so. Aro implements a C preprocessor.

Jackson Huff · Answer 3 · Tue Dec 24 2024 10:01:36 GMT+0800 (China Standard Time)

Is a RISC-V assembler in the scope of this issue?

David Rubin · Answer 4 · Tue Dec 24 2024 10:02:41 GMT+0800 (China Standard Time)

Is a RISC-V assembler in the scope of this issue?

Yes, all targets that Zig supports are in the scope of this issue.

Jackson Huff · Answer 5 · Tue Dec 24 2024 10:10:40 GMT+0800 (China Standard Time)

I'm already writing a RISC-V assembler to make my own project independent of GCC/LLVM because there are absolutely no others out there, so I'd love to help with the same here. However, it's in C++ and I don't know any Zig, so porting might be the best strategy. Here's a direct link to it: https://github.com/Slackadays/Chata/blob/main/libchata/src/assembler.cpp

Alex Rønne Petersen · Answer 6 · Tue Dec 24 2024 18:23:57 GMT+0800 (China Standard Time)

Yes I think so. Aro implements a C preprocessor.

But this would have implications for whether the assembler is in-tree or in a separate repo like ziglang/translate-c, right? What's the thinking there?

Jackson Huff · Answer 7 · Wed Dec 25 2024 01:19:24 GMT+0800 (China Standard Time)

But this would have implications for whether the assembler is in-tree or in a separate repo like ziglang/translate-c, right? What's the thinking there?

Why would this matter? The preprocessor could easily be its own thing since it doesn't actually need to know any C, just the C preprocessor language. Then, the assembler could choose to use it or not depending on the input file, and all's good.

Alex Rønne Petersen · Answer 8 · Wed Dec 25 2024 01:44:52 GMT+0800 (China Standard Time)

It matters because Aro (and its preprocessor) is not going to keep being an in-tree dependency.

Jackson Huff · Answer 9 · Wed Dec 25 2024 02:39:06 GMT+0800 (China Standard Time)

So let's assume Aro is no longer an in-tree dependency. Then it is now a separate repo, which doesn't change anything because Aro's preprocessor can be its own binary or library, say zigcpp for Zig C PreProcessor. At this point, whether the preprocessor is a binary or library is merely an implementation detail because it doesn't change the end result. But since the preprocessor isn't something users typically run on their own it might be simpler to just have it as a separate library.

Andrew Kelley · Answer 10 · Wed Dec 25 2024 06:54:15 GMT+0800 (China Standard Time)

I've just pushed the sans-aro branch. I hope that helps to provide guidance to this discussion.

Alex Rønne Petersen · Answer 11 · Wed Dec 25 2024 07:10:55 GMT+0800 (China Standard Time)

Thanks, that's helpful. Seems like a reasonable direction.

Andrew Kelley · Answer 12 · Wed Dec 25 2024 07:13:36 GMT+0800 (China Standard Time)

Assemblers can start as independent processes (lib/compiler/foo.zig) and then we can determine how to integrate them into new inline assembly (#10761).

They should parse into MIR and use the common MIR lowering code because that will be the method of integration with the compiler.

Instruction data (i.e. arch/x86_64/encodings.zig) should take advantage of ZON as soon as possible (#20271) since it will provide a faster and more memory efficient representation than a large zig source file with the same data.

Alex Rønne Petersen · Answer 13 · Thu Dec 26 2024 13:20:36 GMT+0800 (China Standard Time)

They should parse into MIR and use the common MIR lowering code because that will be the method of integration with the compiler.

Hmm, I don't know if I agree that MIR is at the right level of abstraction for this - at least as it is today.

For inline assembly, it's probably fine, since we likely don't want to allow a lot of the nonsense that you can get away with in GCC-style inline assembly. I imagine that for #10761, for the most part, we will want to limit inline assembly to just machine code and data embedded directly in between instructions.

But for a full assembler, you're kind of in crazy land. You can be emitting machine code in a function and then do .pushsection into some completely unrelated section, emit whatever into it, do .popsection, and go right back to emitting machine code where you were previously. And of course, you can manipulate symbol state like ELF visibility at any point. (You might enjoy reading this page.)

As I understand it, MIR currently has a function view, but a full assembler really needs a whole-object view, and it doesn't seem to me like MIR is the right tool for the job.