64-bit port

Question

64-bit port

ghaerr opened this issue 5 years ago · comments

Rob,

I forked swieros and ported eu.c (user mode emulator) to 64-bit over the weekend. It works well, and built an alternative boot that uses the 64-bit user-mode emulator to run a 32-bit previously-compiled bin/c (c compiler) to compile bin/c.c, then have the new bin/c compile etc/os.c, etc/mkfs.c and then run the emulator to run etc/mkfs to build the file system. It all works great. The next step is to port the protected mode emulator em.c to 64-bit and then everything should be working.

After taking a hard look at the C compiler, I think it is far easier to leave it as 32-bit, as well as of course the v6 OS, since they seem pretty dependent on sizeof int == sizeof int *.

The mechanism I used to port eu.c is very basic: none of the "pointers" or 32-bit pointer code were changed. Instead, all pointers are treated as 32-bit offsets to memory returned by a rewritten sbrk() in linux/libc.h, which initially allocates 64MB once using malloc and now supports break shrinkage as required by the C compiler. A single real pointer to the base of this memory is stored in a "unsigned char *mem" global and then added to any 32-bit "pointer/offset" to be used as a real pointer for each emulation instruction. Thus the execution cost is only a single 64-bit pointer add for each instruction executed. This would also be the case for 32-bit, where previously the sbrk() allocation was added once before start, now its added each instruction execution. So it is slower in 32-bit mode. But it has the advantage of very little changing, which promotes reliability.

The same mechanism can be used for the protected mode emulator, which I plan to port shortly. This approach allowed the C compiler to remain entirely unchanged, and the sbrk() improvement only affects host-compiled programs, since lib/libc.h was not changed either.

I have tested the c compiler recursively compiling the c compiler compiling a program, as well as recursively running the user mode emulator running the user mode emulator running a program :)

I didn't create a pull request since I'm not finished, and wanted your input as to what you think of this approach.

Robert Swierczek · Answer 1 · Tue Mar 12 2019 11:58:09 GMT+0800 (China Standard Time)

Greg Very interesting! I also managed to port mkfs.c and emsafe.c to 64 bit. I did it the more traditional way by making some of the variables longs or unsigned longs which seemed to work well. I needed a few small changes to liinux/libc.c as well. I've included them here so you can take a look. It is interesting how we both took different approaches. I am going to attack c.c tomorrow and hopefully have something in a few days.

Gregory Haerr · Answer 2 · Tue Mar 12 2019 12:04:43 GMT+0800 (China Standard Time)

Rob, how do I get a copy of your new emsafe.c? There was no attachment to your message. Email it to me and I can test on OSX.

Robert Swierczek · Answer 3 · Tue Mar 12 2019 12:14:27 GMT+0800 (China Standard Time)

I sent it along to your email... let me know if you got it

Gregory Haerr · Answer 4 · Tue Mar 12 2019 12:15:50 GMT+0800 (China Standard Time)

Got it. Very interesting indeed!

I like your solution, definitely just keeping a 64-bit pointer in a long is far easier than adding a pointer with every instruction. In looking at the differences between eu.c and em.c, I can see now why I took the approach of adding a base memory pointer. The user mode emulator doesn't compute a page table address, and so sometimes, for instance in the LX instruction, just uses register "a" as the address from which to get load a value. In the LL case, sp could be made a long and things would work the way you've done it.

I'd be interested to hear how you think the user mode emulator should be ported in the LX case, using your method!?

Gregory Haerr · Answer 5 · Tue Mar 12 2019 12:21:43 GMT+0800 (China Standard Time)

On another point, when you run the C compiler under the user emulator, (and I'm pretty sure with the protected mode emulator), and choose to run the compiled program output (no -o option), the C compiler uses an sbrk() to release memory, before adding the BSS. When compiling the compiler compiling itself, this caused an exception in your linux/libc.h::sbrk() for a negative increment not supported, which is one o the reasons I rewrote it to emulate a UNIX sbrk. FYI and comments?

Robert Swierczek · Answer 6 · Tue Mar 12 2019 12:30:30 GMT+0800 (China Standard Time)

Aha, I see now.. yes, the user emulator eu.c does seem to need the base+offset in order to work since things are passed back an forth between the 32 and 64 bit realms. Also, yes, I seem to remember that my sbrk() implementation for the linux version was a giant hack. I'm actually not super happy with the naming of the include files either and have an experimental branch where the names are more or less the posix names (unistd.h, stdio.h, etc.)

Gregory Haerr · Answer 7 · Tue Mar 12 2019 12:31:03 GMT+0800 (China Standard Time)

One last point - the user mode emulator port required the argv[] array to be rebuilt and pushed onto the stack as 32-bit values, to implement the cool recursive execution feature with arguments you've built into it. The C compiler will also have to have the argv stack frame rewritten when running the compiled program output case (no -o option above). So there are a few differences between the way that em.c and eu.c/c.c run compiled programs. That is, c.c and eu.c run user mode programs, while em.c only runs bare metal machine images (Right? or will the em.c work since the CPU defaults to non-protected mode?)

Gregory Haerr · Answer 8 · Tue Mar 12 2019 12:36:07 GMT+0800 (China Standard Time)

With regards to your hacked sbrk implementation, check out mine, it correctly emulates sbrk, which requires that the all linear addresses in between sbrk calls are in fact contiguous. Except that it doesn't handle the case where you need more memory, and I don't actually like allocating all 64MB up front. I was thinking of a method where it called realloc and then reset a magic "base pointer" which would also be used in the user mode emulator. That would require an extern uchar *mem in eu.c, which I'm not particularly crazy about though.

Robert Swierczek · Answer 9 · Tue Mar 12 2019 12:44:02 GMT+0800 (China Standard Time)

hmm.. I think the c compiler wont need to rebox argv since the recursive thing only works inside the os.
(Unlike c4, I jump straight into main versus running another emulator.)

Gregory Haerr · Answer 10 · Tue Mar 12 2019 12:49:36 GMT+0800 (China Standard Time)

No, both c.c and eu.c get their initial argc/argv from main() and the increment argv for their own options and then pass them to either the compiled code in c.c's case (which means it passes a 64-bit pointer from main), or in eu.c's case, the cpu() function is called with main's 64-bit argv.

I have tested running the recursive stuff from outside the os, and it now works with eu.c. That is, you can run ./xeu root/bin/c -Iroot/lib c.c -lroot/lib root/usr/hello.c. This is because eu.c now rewrites the argv stack frame before calling it.

Gregory Haerr · Answer 11 · Tue Mar 12 2019 12:55:41 GMT+0800 (China Standard Time)

Ok, I understand you now. Yes, no need to rebox argv for c.c when running in the OS or through the user mode emulator. However, you WILL need to rebox if you want to run the C compiler without the -o option and jump directly into main(). This case was apparently untested and is also the case where the hacked sbrk() fails with the C compilers memory release before the main() jump...

Robert Swierczek · Answer 12 · Tue Mar 12 2019 12:56:06 GMT+0800 (China Standard Time)

OK, I think we are in agreement!

Robert Swierczek · Answer 13 · Tue Mar 12 2019 13:06:02 GMT+0800 (China Standard Time)

So, my plan for porting the compiler to 64 bit is to revisit the entire expression node stack/array. Instead of a simple array of uints, I plan on having an array of union { int i; long l; node *n; ... }. I think that should work nicely, and clean up the code. I have to work through the code a bit since the node creation/lookup is very simplistic and hackey (since I initially implemented the compiler to self host before I had a lot of stuff working.)

Gregory Haerr · Answer 14 · Tue Mar 12 2019 13:09:35 GMT+0800 (China Standard Time)

I see. So that would eliminate the (uint *) casts that are thrown around and thus compile 64-bit clean?

I have studied the compiler but its pretty complicated. Is the expression node stack similar to your AST implementation in C5?

Robert Swierczek · Answer 15 · Tue Mar 12 2019 13:21:16 GMT+0800 (China Standard Time)

Yes for both.

Also, I've also gone back and forth on what I want the opcode design to be.. I initially was going for "fast yet simple", then I worked on a minimal stripped down version (more like c4), but now I am thinking about having single byte-codes versus the 32 bit opcode/operand format. This would optimize for small size binaries (which would also be fast due to cache friendliness.) Most of the 256 available opcodes would be LDI (load immed)'s and LL (load local int), which I think are the most commonly executed instructions. I may ditch the a,b, and c registers completely and instead have a small evaluation stack (I'm not sure.)

Gregory Haerr · Answer 16 · Tue Mar 12 2019 13:28:59 GMT+0800 (China Standard Time)

I have several ideas on the instruction set, and like the idea of a byte code stack machine, rather than a three register machine. I think it simplifies things immensely, and after looking at the caching issues with the protected mode interpreter, agree that there's more to overall speed than just the instruction set when you've also got to worry about translation buffers.

Looking deep into the the OS trap() function, there may be some complications with instruction restart if instruction lengths aren't identical, or becomes too hard to calculate how much PC to back up.

I have a cool interpreter/compiler that I'm going to dig up, I would like to pass that instruction set over to you for your comments. It's a byte code stack machine (actually runs typeless more like javascript, but that's another story).

Gregory Haerr · Answer 17 · Tue Mar 12 2019 13:33:41 GMT+0800 (China Standard Time)

Another item I've been thinking way too much about is: why does the system ALWAYS use 8-byte pushes for calling functions, and waste such memory when they're only needed for doubles? Initially I thought the reason was for _cdecl varargs, but that can work with a variable size argument list. Is it because of the need for different push instructions? It seems strange that the local variables are allocated size-matching, but the calling stack frame is not. The stack doesn't have to be 8-byte aligned, and that could be easily changed in the OS too, right?

In general a new implementation should use only the size required for the variable when pushing, in order to save memory (I guess this isn't really a big memory deal, its just the thought of it).

Robert Swierczek · Answer 18 · Tue Mar 12 2019 13:38:37 GMT+0800 (China Standard Time)

Awesome. Good ideas... all around.

Also, I don't think I got your sbrk() code or other attachments.. try again directly to my gmail addr.

Gregory Haerr · Answer 19 · Tue Mar 12 2019 13:42:38 GMT+0800 (China Standard Time)

Ok so I've tested your new em.c, the good news is works on OSX, I bootstrapped the system using my user mode emulator and a pre-compiled bin/c to compile the compiler and the OS and its running!

The bad news is that my user mode emulator requires my libc.h with the rewritten sbrk(), and your em.c won't work with that but works with the host ./xem compile. So I renamed your libc.h for now and after compiling ./xem I can interpret the OS and got the $ prompt :)

Robert Swierczek · Answer 20 · Tue Mar 12 2019 13:46:29 GMT+0800 (China Standard Time)

Lots of magic has to occur to get to that $. I was amazed when it worked for me so easily yesterday.

Gregory Haerr · Answer 21 · Fri Mar 15 2019 04:57:22 GMT+0800 (China Standard Time)

Hi Rob, a couple comments on the 64-bit port progress:

I finished the user mode emulator eu.c argv reboxing bug, and things now completely work on 32 and 64 bit, including using it to boot swieros using only it and a pre-existing c compiler binary, as well as recursion from within the os, like "eu eu c.c echo.c 1 2 3". I can send you a pull request if you're interested. It's all committed in the forked repository on my page.

In looking at your 64-bit em.c port, very nice how you've used a ulong in a only few places to make everything work on 64-bit. I wanted to bring to your attention to consider a slightly different type other than ulong, perhaps "uintp" or "uintptr", that would explicitly type the need for "holds an integer or pointer" size, and also solve the problem of a win64 port, or having to add "long" or "ulong" for other reasons later.

Here are the sizes for int/long/ptr for various systems, along with a suggestion for uintptr:

OS int long ptr uintptr
Linux-32 32 32 32 long
Linux-64/Mac 32 64 64 long
Win64 32 32 64 unsigned __int64

I'm not exactly sure how this might work without #ifdef in root/lib/u.h, but even if the "typedef long ulong" were changed to "typedef long uintptr", that would allow em.c and c.c to be ported to win64 another time without interfering with other "long" handling and make it obvious when this special case is required. I use the same mechanism in Microwindows for the win32 implementation on 64-bit, using Microsoft's UINT_PTR typdef within windows.h.

Robert Swierczek · Answer 22 · Fri Mar 15 2019 16:36:49 GMT+0800 (China Standard Time)

Greg,

I think for the time being the simplest approach will be to add the following lines to the bottom of mingw/libc.h:

#ifdef _WIN64
typedef unsigned long long xulong
#define ulong xulong
#endif

Hopefully that should work for the few "portable" boot-strapping programs in swieros.