add an assembler to the toolchain
andrewrk opened this issue · comments
Prerequisite for #16270.
Builds of Zig that do not link against LLVM and Clang still need to be able to compile assembly files.
The existing commands already work, and they already support compiling assembly files: zig build-obj
, zig build-exe
, zig build-lib
. The logic needs to be modified to use Zig's own assembler rather than invoking Clang as a subprocess.
For the x86 family specifically, let us jump on the intel syntax train, embracing that as the better syntax. However, we also want to be able to compile the multitude of existing files from the wild without any changes. So it will need to support AT&T syntax as well.
I suggest we start by borrowing LLVM's CPU instruction data via another tool in the tools/ directory. At some point the backends should start using this data as well instead of using an ad-hoc parser, but that will be a follow-up issue.
In order to close this issue, Zig must use its own assembler for all input files, never calling the clang binary for assembly.
Related:
Should this include a C preprocessor? A lot of assembly files in the wild (.S
) are written with the assumption that they'll be run through one.
Yes I think so. Aro implements a C preprocessor.
Is a RISC-V assembler in the scope of this issue?
Is a RISC-V assembler in the scope of this issue?
Yes, all targets that Zig supports are in the scope of this issue.
I'm already writing a RISC-V assembler to make my own project independent of GCC/LLVM because there are absolutely no others out there, so I'd love to help with the same here. However, it's in C++ and I don't know any Zig, so porting might be the best strategy. Here's a direct link to it: https://github.com/Slackadays/Chata/blob/main/libchata/src/assembler.cpp
Yes I think so. Aro implements a C preprocessor.
But this would have implications for whether the assembler is in-tree or in a separate repo like ziglang/translate-c, right? What's the thinking there?
But this would have implications for whether the assembler is in-tree or in a separate repo like ziglang/translate-c, right? What's the thinking there?
Why would this matter? The preprocessor could easily be its own thing since it doesn't actually need to know any C, just the C preprocessor language. Then, the assembler could choose to use it or not depending on the input file, and all's good.
It matters because Aro (and its preprocessor) is not going to keep being an in-tree dependency.
So let's assume Aro is no longer an in-tree dependency. Then it is now a separate repo, which doesn't change anything because Aro's preprocessor can be its own binary or library, say zigcpp
for Zig C PreProcessor. At this point, whether the preprocessor is a binary or library is merely an implementation detail because it doesn't change the end result. But since the preprocessor isn't something users typically run on their own it might be simpler to just have it as a separate library.
I've just pushed the sans-aro
branch. I hope that helps to provide guidance to this discussion.
Thanks, that's helpful. Seems like a reasonable direction.
Assemblers can start as independent processes (lib/compiler/foo.zig) and then we can determine how to integrate them into new inline assembly (#10761).
They should parse into MIR and use the common MIR lowering code because that will be the method of integration with the compiler.
Instruction data (i.e. arch/x86_64/encodings.zig) should take advantage of ZON as soon as possible (#20271) since it will provide a faster and more memory efficient representation than a large zig source file with the same data.
They should parse into MIR and use the common MIR lowering code because that will be the method of integration with the compiler.
Hmm, I don't know if I agree that MIR is at the right level of abstraction for this - at least as it is today.
For inline assembly, it's probably fine, since we likely don't want to allow a lot of the nonsense that you can get away with in GCC-style inline assembly. I imagine that for #10761, for the most part, we will want to limit inline assembly to just machine code and data embedded directly in between instructions.
But for a full assembler, you're kind of in crazy land. You can be emitting machine code in a function and then do .pushsection
into some completely unrelated section, emit whatever into it, do .popsection
, and go right back to emitting machine code where you were previously. And of course, you can manipulate symbol state like ELF visibility at any point. (You might enjoy reading this page.)
As I understand it, MIR currently has a function view, but a full assembler really needs a whole-object view, and it doesn't seem to me like MIR is the right tool for the job.