Crash with AmigaOS 3.2 / 68060 / Fastmem (I-cache)

Question

Crash with AmigaOS 3.2 / 68060 / Fastmem (I-cache)

terriblefire opened this issue 2 years ago · comments

I'm very impressed with this project. Really marvellous and nicely laid out code.

I am however seeing a crash when starting this on OS 3.2. I'm not sure if its OS3.2, MMULib or my accelerator card that might be causing the issue. The crash happens randomly transferring and running the snippets.

OS3.2 has romwack.

My hardware setup is a full 68060 with MMULib and 128Mb of SDRAM.

Interestingly I can manually create a script and run AllocMem over and over but no issues. I'm happy to help dig into the whys but some hints might be useful.

My end goal is to simply have a cross development environment with a serial cable.

Roc Vallès i Domènech · Answer 1 · Thu Feb 24 2022 14:46:33 GMT+0800 (China Standard Time)

My end goal is to simply have a cross development environment with a serial cable.

This does seem familiar. That's why I started writing pyamigadebug, in the first week of 2021. Before that, I had been using https://github.com/rvalles/amiga_uartrecv for a simple workflow where startup-sequence gets a binary from the serial port into ram, then runs it.

I am however seeing a crash when starting this on OS 3.2. I'm not sure if its OS3.2, MMULib or my accelerator card that might be causing the issue. The crash happens randomly transferring and running the snippets.

I have done some testing (just AmigaXfer, to date) with 3.2, so far without issue, on my beefiest machine (an a1200 with an old Blizzard 1230mkIV (030+882 @ 50 / 128MB).

I do not own any 060 accelboard (I hear TF accelboards are cool 😺) and had never heard of anyone testing with one until now.

I understand you're not using AmigaXfer (have you had any issues with it?) but actually writing your own scripts using pyamigadebug. This brings to mind something I found before the first public release:

RomWack does internally call execlib's Forbid(), but not Disable(). This would be fine if it was fast enough at handling the interrupts to not lose incoming characters from the serial port. But once in a while, it would actually lose them. This was confirmed by looking at jittery timing when echoing characters (with a logic analyzer). Sometimes they'd be outright lost (skipped in the echo).

This didn't happen a lot, but it happened and when it did, it left me out of sync. I could have had to rewrite the whole romwack module to ACK each character, which would of course also make it half-duplex (and thus half speed because echo).

So I tried a few alternatives. I found that adding extra stop bits (1.5 or 2) predictably decreased the frequency of occurrence, but it didn't fix the issue.

I ended up, for amigaXfer purposes, just calling Disable() immediately after getting a debugger object, which predictably solved the issue (no interrupts, no jitter). Just remember to Enable() at some point before you exit the debugger :)

At some point, I mean to write a small server as a snippet so that I don't have to ever use romwack/sad again after initialization, thus allowing me to set a fast SERPER and leave it there until reboot (or further, if I make it reset-resident).

I hope this issue is what you've been experiencing. Please try Disable-ing and see if it's reliable then.

Terriblefire · Answer 2 · Thu Feb 24 2022 17:04:30 GMT+0800 (China Standard Time)

My issue was with simply running amigaXfer. I'd tried to dig in a bit deeper to figure out whats going on so i had more to report. It seems just running code this way in this setup doesnt work except for exec calls.

It seems pretty repeatable.. .

port /dev/cu.usbserial-1440, baud 0, debugger 0, paranoid False, debug True, dangerfast False, resetfirst False, crashentry False
Serial device opened.
Syncing with debugger. Please have Amiga enter debugger now. Refer to README for help.
In RomWack debugger.
Exec v47.7. Base at 0x800089c.
RomWack calling 0x8000824.
Disable.
Saving non-scratch registers.
EClockFreq: 709379
clkpal: True, SERPER: 372, baud: 9600
AmigaSnippets initialization start.
RomWack calling 0x80007d6.
romwack writemem 134571072 96
snip getaddrfile() @ 0x8056440 path: asm/memrecv.o
RomWack calling 0x80007d6.
romwack writemem 134571168 90
snip getaddrfile() @ 0x80564a0 path: asm/memsend.o
RomWack calling 0x80007d6.
snip writemem: addr: 0x8342218 size: 0x436
RomWack calling 0x8056440.

I get a crash dialog on the amiga

Workbench
Program Failed (error #8000000B).
Wait for disk activity to finish.
Suspend | Reboot

One thing to mention is that on an 060 the entire interrupt loop runs out of cache. So it will be much faster than on an 030.

Roc Vallès i Domènech · Answer 3 · Thu Feb 24 2022 17:19:47 GMT+0800 (China Standard Time)

Hmm.

I am quite uninspired as to what could be going on, but would immediately test, to try and narrow it down:

tick Paranoid (basically CRCs every snip before using it, which would catch if it has corrupted)
Edit snippets class to force snips be allocated in chipram (if that's reliable, it would narrow down possible causes) @ https://github.com/rvalles/pyamigadebug/blob/master/AmigaSnippets.py#L130

It seems pretty repeatable.. .

Do you mean it happens every time?

Going by the dump above, it's definitely calling memrecv (see asm/memrecv.S) when this happens, I believe to upload the CRC routine.

One thing to mention is that on an 060 the entire interrupt loop runs out of cache. So it will be much faster than on an 030.

Back then I was using an A500 with a 010. I imagine leaving interrupts enabled is probably OK on the 030, too.

Terriblefire · Answer 4 · Thu Feb 24 2022 22:13:13 GMT+0800 (China Standard Time)

Hmm.

I am quite uninspired as to what could be going on, but would immediately test, to try and narrow it down:

tick Paranoid (basically CRCs every snip before using it, which would catch if it has corrupted)
I have done that. This didnt help.

Edit snippets class to force snips be allocated in chipram (if that's reliable, it would narrow down possible causes) @ https://github.com/rvalles/pyamigadebug/blob/master/AmigaSnippets.py#L130

Yes. This fixed it.

It seems pretty repeatable.. .
Do you mean it happens every time?

Yes. Its the same code every time. memsend fails. Seems to be related to fastmem/chipmem.

I wonder if some ClearCacheU is needed after upload?

EDIT: Thinking about this. Yes the caches need to be cleared after upload. Especially if you're uploading code.

Roc Vallès i Domènech · Answer 5 · Thu Feb 24 2022 22:36:04 GMT+0800 (China Standard Time)

I wonder if some ClearCacheU is needed after upload?
EDIT: Thinking about this. Yes the caches need to be cleared after upload. Especially if you're uploading code.

Suspected as much (or why I wanted you to test on chip).

I had never looked into 68k and cache control, as it's not an issue with the machines I have.

http://amigadev.elowar.com/read/ADCD_2.1/Includes_and_Autodocs_2._guide/node0339.html

It does list "self-modifying code" indeed, meaning writing memory with the cpu is probably not enough to ensure old code isn't cached.

Now I wonder what the cleanest way to handle this from my code would be.

Terriblefire · Answer 6 · Thu Feb 24 2022 22:36:57 GMT+0800 (China Standard Time)

I shall have a patch for you shortly.

Roc Vallès i Domènech · Answer 7 · Thu Feb 24 2022 22:40:17 GMT+0800 (China Standard Time)

I shall have a patch for you shortly.

Great. Take care that the function doesn't exist in kickstart below 37.

This could be handled by doing it on each memwrite, or by (probably better, leaving memwrite as a low-level method) AmigaSnippets after upload.

Terriblefire · Answer 8 · Thu Feb 24 2022 22:46:21 GMT+0800 (China Standard Time)

I shall have a patch for you shortly.

Great. Take care that the function doesn't exist in kickstart below 37.

This could be handled by doing it on each memwrite, or by (probably better, leaving memwrite as a low-level method) AmigaSnippets after upload.

Ah i thought it would work but calling ClearCacheU seems to leave things in a bad state. But yes this is what i thought of doing and doing an exec version check in the ExecLibrary class.

Roc Vallès i Domènech · Answer 9 · Thu Feb 24 2022 22:49:24 GMT+0800 (China Standard Time)

Ah i thought it would work but calling ClearCacheU seems to leave things in a bad state. But yes this is what i thought of doing and doing an exec version check in the ExecLibrary class.

I wonder what is going on with this bad state; Going by its documentation, ClearCacheU seems inoffensive.

Terriblefire · Answer 10 · Thu Feb 24 2022 23:03:51 GMT+0800 (China Standard Time)

Got it.. forgot to call amiga.sync()..

Roc Vallès i Domènech · Answer 11 · Thu Feb 24 2022 23:05:58 GMT+0800 (China Standard Time)

Ah yeah, that's a form of self-inflicted pain I am somewhat familiar with.

If I recall correctly, It is needed when result is not required, and else it is done for you.

Terriblefire · Answer 12 · Thu Feb 24 2022 23:13:15 GMT+0800 (China Standard Time)

fastmem.patch.txt

This fixes it for snippets. But i suspect there are more cases and some sort of check should be done in ClearCacheU for v36+

Roc Vallès i Domènech · Answer 13 · Thu Feb 24 2022 23:19:52 GMT+0800 (China Standard Time)

fastmem.patch.txt

This fixes it for snippets. But i suspect there are more cases and some sort of check should be done in ClearCacheU for v36+

Looks good to me.

I will do some testing on my A1200 and merge when I am home.

I will figure out if i-cache is enabled when booting from hdd, as I tend to work from floppies too much, else I will force it and see (I am hopeful) if I can crash the 68030.

I suspect there are more cases

The only uploading of code that I consciously do is through AmigaSnippets, whereas the asm snippets themselves are written with care to be reentrant. It is hopefully all good now.

Thank you for the effort put in debugging this.

Terriblefire · Answer 14 · Thu Feb 24 2022 23:25:14 GMT+0800 (China Standard Time)

No worries. I have a plan to add a GDBServer to this so i can author and debug code from VSCode. Might just work as an extra button on the menu.

Roc Vallès i Domènech · Answer 15 · Thu Feb 24 2022 23:29:45 GMT+0800 (China Standard Time)

Today I learned I can edit other people's comments on the github repos I own, and is far too easy to do so. I should probably try and be careful to quote comments vs edit them. Ugh.

No worries. I have a plan to add a GDBServer to this so i can author and debug code from VSCode. Might just work as an extra button on the menu.

A GDB Server was part of my roadmap... I kinda wanted to do this before I got derailed into AmigaXfer. But I won't work on this immediately, so feel free to do it yourself if you really want to.

You've likely noticed the recent scsi.device work, and the generic NBD server. I have wired them together and tested exposing the nbd server to Linux, which sees the partitions and can mount them (fairly cool), even if it's very slow. I am now wondering how to make this usable to end-users (i.e. on the GUI), or whether to do so (most users won't even know what to do with a NBD server).

Terriblefire · Answer 16 · Thu Feb 24 2022 23:36:18 GMT+0800 (China Standard Time)

No worries. I have a plan to add a GDBServer to this so i can author and debug code from VSCode. Might just work as an extra button on the menu.

A GDB Server was part of my roadmap... I kinda wanted to do this before I got derailed into AmigaXfer. But I won't work on this immediately, so feel free to do it yourself if you really want to.

A GDB RSP setup is so easy to do. I've even got hardware implementations in Verilog. So i'll see if i can remember the minimum subset.

I had actually started on a replacement for SAD as a GDB stub. i'll need to see what state that is in at some point.

You've likely noticed the recent scsi.device work, and the generic NBD server.

No i've not seen these and i am intrigued.

Roc Vallès i Domènech · Answer 17 · Thu Feb 24 2022 23:37:46 GMT+0800 (China Standard Time)

No i've not seen these and i am intrigued.

Hold on for a gist.

Roc Vallès i Domènech · Answer 18 · Thu Feb 24 2022 23:40:52 GMT+0800 (China Standard Time)

@terriblefire

https://gist.github.com/rvalles/5b783a157df3af694ca72a67fd8a00f7

Roc Vallès i Domènech · Answer 19 · Fri Feb 25 2022 16:00:21 GMT+0800 (China Standard Time)

Couldn't reproduce the crash on my 68030, not even with:

5.Workbench:> cpu
System: 68030 68882 68030-MMU no FastROM (INST: Cache Burst) (DATA: Cache Burst)

Must only happen with 060 (possibly 040 too). I'll clean up and merge now.

Roc Vallès i Domènech · Answer 20 · Fri Feb 25 2022 16:16:54 GMT+0800 (China Standard Time)

Pushed fix. Closing.

Terriblefire · Answer 21 · Fri Feb 25 2022 16:35:29 GMT+0800 (China Standard Time)

Thankyou! :)

Terriblefire · Answer 22 · Fri Feb 25 2022 17:33:57 GMT+0800 (China Standard Time)

Verified its fixed from the latest commit.

Roc Vallès i Domènech · Answer 23 · Sun Aug 21 2022 10:02:56 GMT+0800 (China Standard Time)

Fix deployed in release 1.1.2.