mmitch / gbsplay

gameboy sound player

Home Page:https://mmitch.github.io/gbsplay/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

auto-detect song length (detect emulation state looping)

RiedleroD opened this issue · comments

yeah well the title says it all.

What do you consider a "loop"?

  • A repetition of all subsongs inside a GBS file? (that would be possible)
  • A repetition within a single subsong? (that would be rather impossible to detect because a song in a GBS file is not a bunch of notes but rather a single program that interacts with the Gameboy sound hardware - we don't know what the programm does internally, which is also the reason for not knowing how long a subsong is)

(edited title: removed feature request and added the enhancement label)

I'm considering a loop in this context a repitition of notes in the same subsong.

I see the problem. I thought there might be a way with which you could detect which sections have already played and maybe seek forward to look if a new section comes along. I think this would be a rather hacky approach, but still better than what's given right now. For example: in TRAX, it almost ends prematurely because there's a 2 second gap in the 7th track. I'm not quite sure why it doesn't end, the gap must be slightly smaller than 2 seconds, but still. It'd be nice to have some sort of seeking algorithm that tries to detect loops.

For loop detection we are halfway into halting problem territory ;-)

I think the main problems are that the sound player is a completely freeform program and that we only see IO operations, but don't know anything about notes or any other higher structure (see my comments in issue #66).

Theoretically, we could detect loops:

  • the whole emulator is in a known state and under our control (clock, interrupts, registers, memory, cpu state, …)
  • there is no input from the outside (apart from pausing the emulation or starting with the next track)
  • if we encounter a complete state that we have already seen before, we must be in a loop, because from here on everything will happen exactly like the last time when we were in this state

The problem is that we'd have to record all the states that the emulator steps through: That means the complete RAM state, register state, CPU state, hardware state (like current scanline) etc. would have to be recorded for every single step the emulator calculates. (If for example we only record every 100th step and the repetition happens after 99 steps, we will be 1 step off and won't find the repetition).

Let's say we skip the video RAM, registers and extra stuff and just record the 8kB internal S-RAM. The CPU runs at 4.19 MHz, that would sum up to 33.5 GB of data per second. Let's say I calculated this wrong and/or there are some easy optimizations available, divide it by 1000 and make that 32 MB per second: That is still a lot and we would not only have to record every step, we'd have to compare the current step to all other already recorded steps 4.194.304 times per second. (I immediately think of compression and hashtables here, but the former needs more computation power and the latter more memory.)

So from my current perspective I'd say: We can't do that.

But I'd love to hear other approaches to this problem or that' I'm off by a factor of 1.000.000 and it would indeed be possible.

Wouldn't it be possible to store hashes of the individual states? That would reduce the RAM usage greatly while still detecting loops pretty accurately. Of course it still wouldn't run nearly at realtime speed, but for my purposes (just dumping the wavs) it wouldn't matter.

Calculations, using XXH128 and just the S-RAM:
8kB→128B would amass to 536MB/s (jeez ok that's still massive)

I'm not 100% familiar with how the gameboy hardware works, but it can only loop with a jump instruction, if we go by assembly rules. So maybe the sRAM has to be recorded only when a jump instruction happens? I think this method might miss the first iteration, but please let me hear your thoughts.

edit: another thought would be to only record after jump instructions that jump back.

Another optimisation would be to only store the hashes of the beginning of the track (should be configurable, but the first second or so should suffice), and only check the hashes of the following music against these. This would also fail to catch a lot of tracks with intros, but there aren't many of those around & since it's user-configurable, it shouldn't be a problem.

Execution flow of the emulated playback code and progression of the music aren't normally tied together; the playback code operates sort of like a VM that reads a data stream containing the music data (notes, durations, effects...) and performs the necessary operations on the audio HW to produce sounds. As such it will perform a huge number of jumps that don't have anything to do with the position in the tune itself.

However the general idea of taking only specific snapshots sounds promising.
In fact I think it might be possible to work with a lot less data:
The playback code is most commonly called periodically (let's assume: once per frame) and is designed to use only a fraction of CPU time compared to the entire game. It has an internal state that is saved in SRAM (delay counters, playback pointers, possibly a Ch.3 waveform etc.); each time it is called it picks up from there and decides what to do. Once it has finished its operations for the current frame it leaves the SRAM in a defined state that it can use to resume operation next time, and returns to its caller.

In my opinion it should be sufficient to take a snapshot of the SRAM every time the playback code returns from execution which would commonly be once every 1/60th or 1/64th second; assuming the latter this results in 512kB/s of data to aggregate, or 1kB/s in hashes. Of course as the tune continues to play the amount of data to compare would grow substantially; after an hour of playback 3.5MB worth of hashes would need to be compared 64 times per second. [EDIT: numbers corrected for a hash size of 16, not 128 bytes]

Also, there exists code that never returns and instead maintains its own control loop; detection would not be possible strictly in the way I laid out. But it isn't a common case as far as I can tell.

I think that there is a loop finder in vgmplay tools, and I think it:

  • runs for several minutes, records the generated audio
  • analyzes the audio to find similarities and loops, afterwards

I think it should be simpler and also faster

Well storing the audio alone would use up more RAM than the solution proposed by @mrehkopf (if everything is kept in RAM), and analyzing audio for loops ain't exactly easy either. Let alone the fact that you have to detect a loop of at least 4 iterations before you can be sure that there won't come anything new in the track. And there are plenty of long tracks that would be absolutely steamrolled by this solution if we just render the first 5 minutes and cut out any loops in post-processing. Besides the fact that it would be a nightmare to make it work with realtime playing, but that's …up to the user I guess.

Also, thanks for catching my 128b!=128B mistake, that brings the numbers down a whole lot.

commented

There's also the issue that the state looping period may be longer than the song looping period (e.g. say the player code keeps a running time in memory which keeps increasing).
Similarly there may be issues with some marginal state differences across loops of the music (e.g. timer phase offset).
In the end post-processing the audio is probably the better approach here, as @cyberic99 mentioned.

coming back to this discussion after having fixed some stuff in nosefart and had to understand their internals a bit. They try to infer when a track loops by keeping track of where the last place in memory is that has been accessed. This works fairly well in my experience and is also quite easy to implement (I think, not knowing gbsplay's internals)

imo having an imperfect implementation would at least be better than having nothing at all, especially since people can just… not use it… when they don't want to… you know? anyway, just my two cents, I don't really have time to implement this yet.

Ohhh, Nosefart is still a thing? Cool 😎

Do you have any more details in the nosefart implementation?
I did a quick, experimental, most basic hack like this:

diff --git a/gbcpu.c b/gbcpu.c
index 6652863..11af41c 100644
--- a/gbcpu.c
+++ b/gbcpu.c
@@ -42,6 +42,26 @@ struct opinfo {
 #endif
 };
 
+static uint32_t highest_rom_load_address = 0;
+
+static void remember_rom_load(uint32_t addr) {
+       // FIXME: bank mapping for 0x4000 - 0x7FFF is ignored
+
+       if (addr > 0x7FFF)
+               return; // that's not ROM access
+
+       if (addr > highest_rom_load_address) {
+               highest_rom_load_address = addr;
+               printf("highest load address = 0x%04x\n", addr);
+               return;
+       }
+
+       if (addr == highest_rom_load_address) {
+               printf("RE-ACCESS @ highest  = 0x%04x\n", addr);
+               return;
+       }
+}
+
 static uint32_t none_get(void *priv, uint32_t addr)
 {
        UNUSED(priv);
@@ -67,6 +87,7 @@ static inline uint32_t mem_get(struct gbcpu* const gbcpu, uint
 {
        struct get_entry *e = &gbcpu->getlookup[(addr >> 8) & 0xff];
        gbcpu->cycles += 4;
+       remember_rom_load(addr);
        return e->get(e->priv, addr);
 }

Be aware: This makes the tests fail because of the extra output on stdout, but it compiles and runs.
Run gbsplay with -qqq to only see the output from the new method.

When I run this and watch this on different tunes, the best I saw until now was repeated access marking the start of a beat. While that is surely some kind of repetition, it is not what we want ;-)

So while it does something, it does not do anything useful yet.

How does nosefart distinguish "local" from "global" repetitions?
Is there a timeout involved? Something like "check for read access to the highest accessed memory location, but only if it has not been accessed for the last n seconds"?

iirc it just has some amount of minimum cycles without new address reads before it decides it found the loop point