narenratan / jonesforth_arm64_apl

JonesForth ARM64 with APL symbols

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Would this compile on the new Mac M1 machines ? Any idea how much work it would be?

klapauciusisgreat opened this issue · comments

This is really more a question than an issue, but maybe the best place to ask.

I have access to a M1 MBP, but haven't really played much with its assembler. I know the intel Macs don't use gas, so I had to do a bunch of things to make jonesforth work on 64bit x86 on the mac.

Any sage advice on whether it would be a good/bad idea to port would be much appreciated.

Thanks very much for your question and sorry for my long delay in getting back to you. It would be very cool to run this on one of the M1 Macs! I must admit I have very little experience with macs (or with assembly really) so I don't know how much work it would be. If I do find myself with a Mac I'll have an experiment though!

So, on M1, there is the problem that MacOS does not allow pages with PROT_WRITE and PROT_EXEC bits set, so the assembler written in FORTH itself, and self modifying machine code won't work. It's possible to run this code in a VM on linux on the same hardware though, after mmqping memory with RWX protection bits set.

On which hardware did you run the code 'as is' ?

That is very cool that you got it to run on an M1 Mac!

I was running it on a Raspberry Pi (4 I think), as far as I can remember running Archlinux ARM for AArch64. For a while I was also using a NanoPi Fire3 running a BSD (possibly NetBSD) but that may have been before adding the Forth assembler. At some point I'll try and find the Pis and try running this again.

I'd like you to send you a pull request to add the option of getting memory with mmap instead of using the BSS (which did not work on UTM using both arch linux as well as debian 11.

Maybe when you initally ran it, linux did not have the execute bit disabled for the .bss data, or maybe it never is on the rpi.

Would you be OK with a code change such as

_start:
#define USER_DEFS_SIZE 0x10000 // 640K
#ifdef USE_LINUX_MMAP
        // reserve user space via MMAP, which permissions that allow self modifying code, for distributions that don't make .bss RWX
        mov x0, 0     // start address
        mov x1, USER_DEFS_SIZE  // length
        mov x2, 7      // rwx
        mov x3, 22   // flags - MAP_ANONYMOUS|MAP_PRIVATE
        mov x4, -1    // file descriptor
        mov x5, 0     // offset
        mov x8,222  // mmap
        svc 0
        cmp x0, -1
        bne 1f           // mmap worked!
        mov x0, 1     // exit with error 1
        mov x8, 93
        svc 0
1:
        mov H,x0; adr L,lSYS                    // Initialize Here and Latest pointers
        add R,x0, USER_DEFS_SIZE;       // Initialize Return stack pointer
#else
	adr H,Dstart; adr R,Rtop;	// Initialize Here and Return stack pointer
#endif

// rest of initialization goes here

#ifndef USE_LINUX_MMAP
	.bss
	.align 3; Dstart: .space USER_DEFS_SIZE; Rtop:		// Space for data area (pointed to by H) and return stack
#endif

I hate to make the assembly more complicated, but I don't know a better way to do this either - you could use mprotect on the BSS segment, but that's no easier. I dunno if there is a portable way to modify the elf executable to mark the .bss executable.

Sure that would be great! I also don't mind about preserving the old use of bss - so if you'd like to just keep the USE_LINUX_MMAP behaviour above and drop the #ifdefs that's good with me :)

Old me thought:

OK. I'll try to get hold of a new rpi first (I only have an old ARM7 one
connected right now that doesn't know the arm64 opcodes, so I want to test
a bit more).

If mmap works everywhere, I'll do as you suggest.

New me: Yes, as I thought, the original code has issues with write/execute
permissions when testing on new AARCH64 rpi. However, mapping works, so
I'll prep that changelist.

Thanks so much for looking into this! Sounds like mmap is definitely the way to go. Not sure how using .bss ever worked.

Now, I tried to sit down and actually fully understand jonesforth.f. It's
pretty compact and not the easiest to understand. But it's very clever,
especially the ◁ and ▷ words. Took me a while to understand ;) I'm still
chewing on the combinators and continuations.

I must apologize for the density - I don't usually code this way I promise! This was a bit of an experiment in being as short and symbolic as possible. At the time I had most of it in my head and it was nice to work on, but coming back to it I must admit it is pretty opaque to me! For ◁ and ▷ I took the approach from R.G Loeliger's book Threaded Interpretive Languages. It's available on archive.org; the section on <BUILDS and DOES> is here. The combinators examples implement a few of the combinators from Brent Kerby's page here.

May I ask whether this code was written for a class at Oxford ? Did you do
it all by yourself ?
I'm asking because I'd though to be an OK programmer, but just
understanding the code took me a long time ;)

I would love to have taken a class on this sort of thing! Sadly I was a physics student, mainly just firing lasers at things :p Indeed I wrote it myself although it's heavily based on original Jonesforth plus the Loeliger book. Looking back the style makes things unnecessarily obscure which certainly reflects only on the the author and not at all on the reader :)

Anyway, I do have another question about the stack-of-stacks. If I
understand it correctly, the code will

  1. on SPUSH write the first 16 words on the stack (ie starting with D0) to
    SS, then decrease SS by 16 words.
  2. SPOP will write the lastest pushed stack back to the beginning of the
    real stack and restore D to the value before SPUSH was called.

But there are a few problems:

a) the stack of stacks is allocated on the stack itself, 16 words away from
the top of the stack.
The moment you have more than 16 values on the data stack, it will clobber
the SS area.

Maybe SS should be defined as
1000 CELLS ALLOT VAR SS
or something.

Yep this would certainly be better I think. I picked 16 as a 'largish' depth, which does seem unwise.

b) the SPOP is always copying to SS0 and upwards, so cannot write to
anything further down the stack

should it maybe copy from (SS-10, SS] -> (min( D0, D0+DEPTH-10), D0+DEPTH) ?

I think you're right. If I remember the stack of stacks stuff was buggy (sorry about that!). Incidentally I'll try and add some proper tests at some point; at the time I was just testing things manually in the interpreter.

c) BTW: debugging, I noticed I cannot enter D as 0xd, took me a while to
realize that there is conflict with D register ;). You can always enter it
as 0D though.

Haha I also ran into this and also resorted to 0D. I liked having D for the register too much although the name clash is certainly a pain.

Either way, I wonder if there is a specific reason you didn't follow the
original jonesforth for exceptions,
which had a different implementation for the stacks. It's possible I
completely miss something obvious,
in which case I would be very grateful if you can set me straight before I
go on a wild goose chase.

The stack of stacks was a slightly zany attempt to preserve what was on the stack when an exception is caught. The original jonesforth exceptions just preserve the depth I think (original comments here). Maybe preserving what was on the stack doesn't matter so much, in which case we could drop the stack of stacks altogether.

Many thanks, and have a great week

K

Thanks very much for your interest in this!

P.S. On the RPI, It's the Rpi 400 (with integrated keyboard), I sometime
run into an 'illegal instruction' when I run

$ cat jonesforth.f examples/defining_words.f | ./a.out
⍋ JONESFORTH ARM64 ⍋
Illegal instruction
$

It's weird. Sometimes, everything works AOK, sometimes I get this. Once I
get this, the program will behave like this for a while (even if I
recompile). Eventually, it will work again. I wonder if this is a HW issue,
and one core is misbehaving.

If you have noticed something similar, please let me know.

Ah I had seen this intermittent 'Illegal instruction'. Sadly I never got to the bottom of it :( The defining words code for ◁ and ▷ does use the forth assembler so I wondered if that was buggy (alignment maybe?). It was tricky to investigate since it happened apparently at random.

No problem at all :) It's really great to have someone else look at this so deeply!

On typing APL symbols: there's a good general page on the APL wiki here and a Linux specific page here. On Linux using X11

$ setxkbmap -layout us,apl -option grp:switch

gets a US keyboard layout with right alt to get APL symbols.

The only issue is that I rashly used some non-APL symbols like ⟦ and ⟧. I entered those by modifying the APL keymap and setting the new one with xkbcomp. Embarrassingly I can't remember the details I'm afraid! I'll try and find how I did it.

Incidentally I also set up a little 6x4 keypad which was a hex numpad + APL symbols, which was handled in the modified X11 keymap. I started setting up a 128 key version with lots of maths symbols and relegendable keycaps - got it working but I never finished making the cardboard labels for all the keys!