Would this compile on the new Mac M1 machines ? Any idea how much work it would be?

Question

Would this compile on the new Mac M1 machines ? Any idea how much work it would be?

klapauciusisgreat opened this issue 3 years ago · comments

This is really more a question than an issue, but maybe the best place to ask.

I have access to a M1 MBP, but haven't really played much with its assembler. I know the intel Macs don't use gas, so I had to do a bunch of things to make jonesforth work on 64bit x86 on the mac.

Any sage advice on whether it would be a good/bad idea to port would be much appreciated.

narenratan · Answer 1 · Tue Oct 11 2022 15:25:56 GMT+0800 (China Standard Time)

Thanks very much for your question and sorry for my long delay in getting back to you. It would be very cool to run this on one of the M1 Macs! I must admit I have very little experience with macs (or with assembly really) so I don't know how much work it would be. If I do find myself with a Mac I'll have an experiment though!

Klapaucius · Answer 2 · Thu Oct 20 2022 09:33:57 GMT+0800 (China Standard Time)

I did the experiment and got it to work- I will share the port shortly. However, apple seemingly disabled the ability for data segments that are both writable and executable (security reasons?). So advanced creation / running of new assembly via the runtime assembler is not working yet, and I do not have a good workaround...

…

On Mon, Oct 10, 2022, 21:26 narenratan ***@***.***> wrote: Thanks very much for your question and sorry for my long delay in getting back to you. It would be very cool to run this on one of the M1 Macs! I must admit I have very little experience with macs (or with assembly really) so I don't know how much work it would be. If I do find myself with a Mac I'll have an experiment though! — Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGMK7CILBK3XHKHN2DBMJ2LWCUJBBANCNFSM43LF2H6A> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Klapaucius · Answer 3 · Wed Oct 26 2022 00:57:59 GMT+0800 (China Standard Time)

So, on M1, there is the problem that MacOS does not allow pages with PROT_WRITE and PROT_EXEC bits set, so the assembler written in FORTH itself, and self modifying machine code won't work. It's possible to run this code in a VM on linux on the same hardware though, after mmqping memory with RWX protection bits set.

On which hardware did you run the code 'as is' ?

narenratan · Answer 4 · Wed Oct 26 2022 02:11:58 GMT+0800 (China Standard Time)

That is very cool that you got it to run on an M1 Mac!

I was running it on a Raspberry Pi (4 I think), as far as I can remember running Archlinux ARM for AArch64. For a while I was also using a NanoPi Fire3 running a BSD (possibly NetBSD) but that may have been before adding the Forth assembler. At some point I'll try and find the Pis and try running this again.

Klapaucius · Answer 5 · Wed Oct 26 2022 05:00:49 GMT+0800 (China Standard Time)

I'd like you to send you a pull request to add the option of getting memory with mmap instead of using the BSS (which did not work on UTM using both arch linux as well as debian 11.

Maybe when you initally ran it, linux did not have the execute bit disabled for the .bss data, or maybe it never is on the rpi.

Would you be OK with a code change such as

_start:
#define USER_DEFS_SIZE 0x10000 // 640K
#ifdef USE_LINUX_MMAP
        // reserve user space via MMAP, which permissions that allow self modifying code, for distributions that don't make .bss RWX
        mov x0, 0     // start address
        mov x1, USER_DEFS_SIZE  // length
        mov x2, 7      // rwx
        mov x3, 22   // flags - MAP_ANONYMOUS|MAP_PRIVATE
        mov x4, -1    // file descriptor
        mov x5, 0     // offset
        mov x8,222  // mmap
        svc 0
        cmp x0, -1
        bne 1f           // mmap worked!
        mov x0, 1     // exit with error 1
        mov x8, 93
        svc 0
1:
        mov H,x0; adr L,lSYS                    // Initialize Here and Latest pointers
        add R,x0, USER_DEFS_SIZE;       // Initialize Return stack pointer
#else
	adr H,Dstart; adr R,Rtop;	// Initialize Here and Return stack pointer
#endif

// rest of initialization goes here

#ifndef USE_LINUX_MMAP
	.bss
	.align 3; Dstart: .space USER_DEFS_SIZE; Rtop:		// Space for data area (pointed to by H) and return stack
#endif

I hate to make the assembly more complicated, but I don't know a better way to do this either - you could use mprotect on the BSS segment, but that's no easier. I dunno if there is a portable way to modify the elf executable to mark the .bss executable.

narenratan · Answer 6 · Wed Oct 26 2022 06:05:35 GMT+0800 (China Standard Time)

Sure that would be great! I also don't mind about preserving the old use of bss - so if you'd like to just keep the USE_LINUX_MMAP behaviour above and drop the #ifdefs that's good with me :)

Klapaucius · Answer 7 · Tue Nov 01 2022 06:52:43 GMT+0800 (China Standard Time)

Old me thought: OK. I'll try to get hold of a new rpi first (I only have an old ARM7 one connected right now that doesn't know the arm64 opcodes, so I want to test a bit more). If mmap works everywhere, I'll do as you suggest. New me: Yes, as I thought, the original code has issues with write/execute permissions when testing on new AARCH64 rpi. However, mapping works, so I'll prep that changelist. Now, I tried to sit down and actually fully understand jonesforth.f. It's pretty compact and not the easiest to understand. But it's very clever, especially the ◁ and ▷ words. Took me a while to understand ;) I'm still chewing on the combinators and continuations. May I ask whether this code was written for a class at Oxford ? Did you do it all by yourself ? I'm asking because I'd though to be an OK programmer, but just understanding the code took me a long time ;) Anyway, I do have another question about the stack-of-stacks. If I understand it correctly, the code will 1) on SPUSH write the first 16 words on the stack (ie starting with D0) to SS, then decrease SS by 16 words. 2) SPOP will write the lastest pushed stack back to the beginning of the real stack and restore D to the value before SPUSH was called. But there are a few problems: a) the stack of stacks is allocated on the stack itself, 16 words away from the top of the stack. The moment you have more than 16 values on the data stack, it will clobber the SS area. Maybe SS should be defined as 1000 CELLS ALLOT VAR SS or something. b) the SPOP is always copying to SS0 and upwards, so cannot write to anything further down the stack should it maybe copy from (SS-10, SS] -> (min( D0, D0+DEPTH-10), D0+DEPTH) ? c) BTW: debugging, I noticed I cannot enter D as 0xd, took me a while to realize that there is conflict with D register ;). You can always enter it as 0D though. Either way, I wonder if there is a specific reason you didn't follow the original jonesforth for exceptions, which had a different implementation for the stacks. It's possible I completely miss something obvious, in which case I would be very grateful if you can set me straight before I go on a wild goose chase. Many thanks, and have a great week K P.S. On the RPI, It's the Rpi 400 (with integrated keyboard), I sometime run into an 'illegal instruction' when I run $ cat jonesforth.f examples/defining_words.f | ./a.out ⍋ JONESFORTH ARM64 ⍋ Illegal instruction $ It's weird. Sometimes, everything works AOK, sometimes I get this. Once I get this, the program will behave like this for a while (even if I recompile). Eventually, it will work again. I wonder if this is a HW issue, and one core is misbehaving. If you have noticed something similar, please let me know. K

…

On Tue, Oct 25, 2022 at 12:05 PM narenratan ***@***.***> wrote: Sure that would be great! I also don't mind about preserving the old use of bss - so if you'd like to just keep the USE_LINUX_MMAP behaviour above and drop the #ifdefs that's good with me :) — Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGMK7CILF4QNK2P7XBPGRLTWFBKTTANCNFSM43LF2H6A> . You are receiving this because you authored the thread.Message ID: ***@***.***>

narenratan · Answer 8 · Wed Nov 02 2022 05:00:17 GMT+0800 (China Standard Time)

Old me thought:

OK. I'll try to get hold of a new rpi first (I only have an old ARM7 one
connected right now that doesn't know the arm64 opcodes, so I want to test
a bit more).

If mmap works everywhere, I'll do as you suggest.

New me: Yes, as I thought, the original code has issues with write/execute
permissions when testing on new AARCH64 rpi. However, mapping works, so
I'll prep that changelist.

Thanks so much for looking into this! Sounds like mmap is definitely the way to go. Not sure how using .bss ever worked.

Now, I tried to sit down and actually fully understand jonesforth.f. It's
pretty compact and not the easiest to understand. But it's very clever,
especially the ◁ and ▷ words. Took me a while to understand ;) I'm still
chewing on the combinators and continuations.

I must apologize for the density - I don't usually code this way I promise! This was a bit of an experiment in being as short and symbolic as possible. At the time I had most of it in my head and it was nice to work on, but coming back to it I must admit it is pretty opaque to me! For ◁ and ▷ I took the approach from R.G Loeliger's book Threaded Interpretive Languages. It's available on archive.org; the section on <BUILDS and DOES> is here. The combinators examples implement a few of the combinators from Brent Kerby's page here.

May I ask whether this code was written for a class at Oxford ? Did you do
it all by yourself ?
I'm asking because I'd though to be an OK programmer, but just
understanding the code took me a long time ;)

I would love to have taken a class on this sort of thing! Sadly I was a physics student, mainly just firing lasers at things :p Indeed I wrote it myself although it's heavily based on original Jonesforth plus the Loeliger book. Looking back the style makes things unnecessarily obscure which certainly reflects only on the the author and not at all on the reader :)

Anyway, I do have another question about the stack-of-stacks. If I
understand it correctly, the code will

on SPUSH write the first 16 words on the stack (ie starting with D0) to
SS, then decrease SS by 16 words.

SPOP will write the lastest pushed stack back to the beginning of the
real stack and restore D to the value before SPUSH was called.

But there are a few problems:

a) the stack of stacks is allocated on the stack itself, 16 words away from
the top of the stack.
The moment you have more than 16 values on the data stack, it will clobber
the SS area.

Maybe SS should be defined as
1000 CELLS ALLOT VAR SS
or something.

Yep this would certainly be better I think. I picked 16 as a 'largish' depth, which does seem unwise.

b) the SPOP is always copying to SS0 and upwards, so cannot write to
anything further down the stack

should it maybe copy from (SS-10, SS] -> (min( D0, D0+DEPTH-10), D0+DEPTH) ?

I think you're right. If I remember the stack of stacks stuff was buggy (sorry about that!). Incidentally I'll try and add some proper tests at some point; at the time I was just testing things manually in the interpreter.

c) BTW: debugging, I noticed I cannot enter D as 0xd, took me a while to
realize that there is conflict with D register ;). You can always enter it
as 0D though.

Haha I also ran into this and also resorted to 0D. I liked having D for the register too much although the name clash is certainly a pain.

Either way, I wonder if there is a specific reason you didn't follow the
original jonesforth for exceptions,
which had a different implementation for the stacks. It's possible I
completely miss something obvious,
in which case I would be very grateful if you can set me straight before I
go on a wild goose chase.

The stack of stacks was a slightly zany attempt to preserve what was on the stack when an exception is caught. The original jonesforth exceptions just preserve the depth I think (original comments here). Maybe preserving what was on the stack doesn't matter so much, in which case we could drop the stack of stacks altogether.

Many thanks, and have a great week

K

Thanks very much for your interest in this!

P.S. On the RPI, It's the Rpi 400 (with integrated keyboard), I sometime
run into an 'illegal instruction' when I run

$ cat jonesforth.f examples/defining_words.f | ./a.out
⍋ JONESFORTH ARM64 ⍋
Illegal instruction
$

It's weird. Sometimes, everything works AOK, sometimes I get this. Once I
get this, the program will behave like this for a while (even if I
recompile). Eventually, it will work again. I wonder if this is a HW issue,
and one core is misbehaving.

If you have noticed something similar, please let me know.

Ah I had seen this intermittent 'Illegal instruction'. Sadly I never got to the bottom of it :( The defining words code for ◁ and ▷ does use the forth assembler so I wondered if that was buggy (alignment maybe?). It was tricky to investigate since it happened apparently at random.

Klapaucius · Answer 9 · Thu Nov 03 2022 03:40:03 GMT+0800 (China Standard Time)

Thanks! That response makes me feel sane again. I'll see if I can make things work better, but it will take some time. I want to add some tests as well. Also, you say that you're not good at assembler, but the number parsing/printing and the forth assembler is pretty ingenuous. The illegal instruction is indeed weird, I'll experiment a bit - seems rpi specific as I hadn't noticed it on the VM running linux on the M1 Mac. I don't think it is the assembler as the problem is intermittent and sort of sticky for a while. Could be a power issue on the rpi, too - those are often seeing undervoltage (even though I use the original power supply) Last question I had was how you typed the symbols. I'm doing more debugging and it would be helpful not to have to cut and paste :) Based on your job description, I assume you have a proper APL keyboard integration :) ? Anyway, many thanks! K

…

On Tue, Nov 1, 2022 at 11:00 AM narenratan ***@***.***> wrote: Old me thought: OK. I'll try to get hold of a new rpi first (I only have an old ARM7 one connected right now that doesn't know the arm64 opcodes, so I want to test a bit more). If mmap works everywhere, I'll do as you suggest. New me: Yes, as I thought, the original code has issues with write/execute permissions when testing on new AARCH64 rpi. However, mapping works, so I'll prep that changelist. Thanks so much for looking into this! Sounds like mmap is definitely the way to go. Not sure how using .bss ever worked. Now, I tried to sit down and actually fully understand jonesforth.f. It's pretty compact and not the easiest to understand. But it's very clever, especially the ◁ and ▷ words. Took me a while to understand ;) I'm still chewing on the combinators and continuations. I must apologize for the density - I don't usually code this way I promise! This was a bit of an experiment in being as short and symbolic as possible. At the time I had most of it in my head and it was nice to work on, but coming back to it I must admit it is pretty opaque to me! For ◁ and ▷ I took the approach from R.G Loeliger's book *Threaded Interpretive Languages*. It's available on archive.org; the section on <BUILDS and DOES> is here <https://archive.org/details/R.G.LoeligerThreadedInterpretiveLanguagesTheirDesignAndImplementationByteBooks1981/page/n83/mode/2up>. The combinators examples implement a few of the combinators from Brent Kerby's page here <http://tunes.org/~iepos/joy.html>. May I ask whether this code was written for a class at Oxford ? Did you do it all by yourself ? I'm asking because I'd though to be an OK programmer, but just understanding the code took me a long time ;) I would love to have taken a class on this sort of thing! Sadly I was a physics student, mainly just firing lasers at things :p Indeed I wrote it myself although it's heavily based on original Jonesforth plus the Loeliger book. Looking back the style makes things unnecessarily obscure which certainly reflects only on the the author and not at all on the reader :) Anyway, I do have another question about the stack-of-stacks. If I understand it correctly, the code will 1. on SPUSH write the first 16 words on the stack (ie starting with D0) to SS, then decrease SS by 16 words. 2. SPOP will write the lastest pushed stack back to the beginning of the real stack and restore D to the value before SPUSH was called. But there are a few problems: a) the stack of stacks is allocated on the stack itself, 16 words away from the top of the stack. The moment you have more than 16 values on the data stack, it will clobber the SS area. Maybe SS should be defined as 1000 CELLS ALLOT VAR SS or something. Yep this would certainly be better I think. I picked 16 as a 'largish' depth, which does seem unwise. b) the SPOP is always copying to SS0 and upwards, so cannot write to anything further down the stack should it maybe copy from (SS-10, SS] -> (min( D0, D0+DEPTH-10), D0+DEPTH) ? I think you're right. If I remember the stack of stacks stuff was buggy (sorry about that!). Incidentally I'll try and add some proper tests at some point; at the time I was just testing things manually in the interpreter. c) BTW: debugging, I noticed I cannot enter D as 0xd, took me a while to realize that there is conflict with D register ;). You can always enter it as 0D though. Haha I also ran into this and also resorted to 0D. I liked having D for the register too much although the name clash is certainly a pain. Either way, I wonder if there is a specific reason you didn't follow the original jonesforth for exceptions, which had a different implementation for the stacks. It's possible I completely miss something obvious, in which case I would be very grateful if you can set me straight before I go on a wild goose chase. The stack of stacks was a slightly zany attempt to preserve what was on the stack when an exception is caught. The original jonesforth exceptions just preserve the depth I think (original comments here <https://github.com/narenratan/jonesforth_arm64_apl/blob/master/original_jonesforth/jonesforth.f#L1211-L1222>). Maybe preserving what was on the stack doesn't matter so much, in which case we could drop the stack of stacks altogether. Many thanks, and have a great week K Thanks very much for your interest in this! P.S. On the RPI, It's the Rpi 400 (with integrated keyboard), I sometime run into an 'illegal instruction' when I run $ cat jonesforth.f examples/defining_words.f | ./a.out ⍋ JONESFORTH ARM64 ⍋ Illegal instruction $ It's weird. Sometimes, everything works AOK, sometimes I get this. Once I get this, the program will behave like this for a while (even if I recompile). Eventually, it will work again. I wonder if this is a HW issue, and one core is misbehaving. If you have noticed something similar, please let me know. Ah I had seen this intermittent 'Illegal instruction'. Sadly I never got to the bottom of it :( The defining words code for ◁ and ▷ does use the forth assembler so I wondered if that was buggy (alignment maybe?). It was tricky to investigate since it happened apparently at random. — Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGMK7CM4SXK7OS2C53Y2UDLWGGAGZANCNFSM43LF2H6A> . You are receiving this because you authored the thread.Message ID: ***@***.***>

narenratan · Answer 10 · Fri Nov 04 2022 06:09:12 GMT+0800 (China Standard Time)

No problem at all :) It's really great to have someone else look at this so deeply!

On typing APL symbols: there's a good general page on the APL wiki here and a Linux specific page here. On Linux using X11

$ setxkbmap -layout us,apl -option grp:switch

gets a US keyboard layout with right alt to get APL symbols.

The only issue is that I rashly used some non-APL symbols like ⟦ and ⟧. I entered those by modifying the APL keymap and setting the new one with xkbcomp. Embarrassingly I can't remember the details I'm afraid! I'll try and find how I did it.

Incidentally I also set up a little 6x4 keypad which was a hex numpad + APL symbols, which was handled in the modified X11 keymap. I started setting up a 128 key version with lots of maths symbols and relegendable keycaps - got it working but I never finished making the cardboard labels for all the keys!