zeta0134 / LuaGB

A gameboy emulator written in pure Lua. Work in progress.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

In Pokemon Red, when LuaGB is awaiting a key press it has constant lagspikes

meepen opened this issue · comments

I haven't tested this in any other games yet, but it probably will happen in all of the pokemon games.

Fun! This is reproducable, it occurs anywhere Pokemon Red has a blinking triangle cursor in my testing. I tried it out in Silver / Gold just to check, and I get the inverse (frankly, expected) behavior, the game slows down a bit during text drawing, but speeds up at the blinking cursor when it has nothing to do.

I suspect this is less of a bug and more of a performance problem, but it'll still be fun to profile later.

I also noticed that while walking it would freeze at some point between the last and next step.

yah i just loaded pokemon gold and i'm just spamming A and B to skip conversation then my screen when dark and droped from 60 to 1-4 FPS(i just try'd it rn)

this is due to io port 3e
also, here's some opcode performance stats
image

Oops, did something wrong before. the actual port is 46 (DMA)
image

Those are pretty neat performance charts, how did you generate those? I have a question though, are we looking at average time for a single run of these opcodes, or are we looking a total time spent for several calls?

Both of those opcodes pointed out should be a bit heavy, for reasons mostly related to the memory locations involved. The DMA register on real hardware starts an OAM transfer for sprites, but in LuaGB it does go ahead and perform the whole transfer in one go, and doesn't bother with the memory access changes. As such, that one opcode will seem to take much longer than it should because the emulator is cheating a bit.

The 0xFF3E opcode is in the WAV pattern table for the third sound channel, and I need to check my source when I get home, but I'm nearly certain that writes to this table fast-forward audio generation for accurate timing. This can cause the first write to be unusually long running, as the rest of the audio subsystem needs to play catch-up before control returns to the game. I'm not super happy with how this is designed, but it'll probably explain any weird discrepancies where one audio write seems unusually expensive in tests. That'll likely vary by game, depending on which register that game's music / sound engine writes to first at the start of its vblank routine.

The numbers on the chart are as follows: [opcode + extra info] <name> - <average call time for single opcode> (<percentage compared to all other opcodes ran>)

I wrote a new panel locally to benchmark since I love analyzing performance for stuff like this, and then optimizing. I can throw it up in a bit on my github fork.

It seems most of the performance comes from looking up memory pages so many times in the DMA write logic function. I've optimized it locally as follows:

  io.write_logic[ports.DMA] = function(byte)
    -- DMA Transfer. Copies data from 0x0000 + 0x100 * byte, into OAM data
    local destmap = memory.get_map(0xfe)
    local sourcemap = memory.get_map(byte)
    local source = 0x0000 + 0x100 * byte
    local destination = 0xFE00
    while destination <= 0xFE9F do
      destmap[destination] = sourcemap[source]
      destination = destination + 1
      source = source + 1
    end
    -- TODO: Implement memory access cooldown; real hardware requires
    -- programs to call DMA transfer from High RAM and then wait there
    -- for several clocks while it finishes.
  end

You can disregard the WAV opcode as my profiler was reading the "extra information" bit after it ran the opcode, so it got bad data.

After my changes:
image

https://github.com/meepen/LuaGB/blob/f5b4ee52f7bdd465afd8dbea295b438c069d1d06/gameboy/z80/init.lua#L326

here's how the data is collected, it's sloppy and can cause imprecision and uses ffi with windows functions so it isn't exactly portable in this state - but usable enough for me currently

https://github.com/meepen/LuaGB/blob/f5b4ee52f7bdd465afd8dbea295b438c069d1d06/love/panels/profiler.lua
and here is the actual panel

Just looking at those changes, I'm struck by how inefficient the memory mapping tables I've constructed might be. I should revisit that design if possible; as clean as the resulting code is from a Lua design perspective, I suspect the metatable lookups are not doing overall performance any favors. Good catch!

Testing confirmed. I modified your code slightly, looks like you've added some helper functions that I don't have, but this still worked like a charm. This is the first time Oracle of Ages has made it through the intro cutscene without any slowdowns on my system, so that's helping more than just Pokemon. :D

That double-table index in memory.read_byte() is highly suspect to me now; I want to see if I can clean up the block mapping code and get something more efficient going on there. Alas, IRL pulls me back to work for today.

    -- DMA Transfer. Copies data from 0x0000 + 0x100 * byte, into OAM data
    local destmap = memory.block_map[0xfe00]
    local sourcemap = memory.block_map[byte * 0x100]
    local source = 0x0000 + 0x100 * byte
    local destination = 0xFE00
    while destination <= 0xFE9F do
      destmap[destination] = sourcemap[source]
      destination = destination + 1
      source = source + 1
    end
    -- TODO: Implement memory access cooldown; real hardware requires
    -- programs to call DMA transfer from High RAM and then wait there
    -- for several clocks while it finishes.
  end

9350a4a

I've optimized locally the mmu for your system to have it one gigantic table of memory. so far results look good. I just need to find the last bug.

image

meepen@c524777
I've made my memory changes here if you'd like to test or pull them in.

The original issue from this pull request is still prevalent though.

I've uploaded a new release with some rather major LuaJIT performance improvements that merit a second pass on this issue. I'm not able to reproduce the lag problems in Pokemon Red anymore on the current build. Can you give it a whirl and see if the issue persists? I think there is still some variation (we can't change how the game is programmed) but it doesn't seem to cause enough of a slowdown to drop frames anymore.