luciusDXL / TheForceEngine

Modern "Jedi Engine" replacement supporting Dark Forces, mods, and in the future Outlaws.

Home Page:https://TheForceEngine.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

More CPU activity (even with OpenGL renderer) than dosbox

MIvanchev opened this issue · comments

Hi there, I wanted to replay the DF campaign and figured out the best way to go ahead would be TheForceEngine (great project btw). However on my Thinkpad with iGPU it runs (in every configuration) slower than DF in DosBox. Even with the OpenGL renderer I experience 50% CPU activity and the fan doesn't stop blowing where as dosbox never turns the fan on. Any advice on this? I'm playing on Linux.

Did some research, the CPU usage spikes immediately after the program is launched, on the 2D menu. top shows 3 processes, 1 with very high CPU activity and 2 with ~5%.

It was told somewhere in this repo that VSync enable would also schedule CPU-idling. But that did not work for my WIN32 environment. So a few months ago I hacked up this bit below , that works OK for the time being. Should be cross-platform.
Result is a total of 30% use of 4-core CPU, spread evenly across the four cores.

file: TFE_System/system.cpp

	void sleep(u32 sleepDeltaMS)
	{
		if (sleepDeltaMS==0) SDL_Delay(1);
		else SDL_Delay(sleepDeltaMS); 
	}

Yes, you appear to be correct, the main loop of TFE seems to do busy waits so the CPU is maxed out all the time... This seems to be a common SDL issue. Maybe a long term solution would involve SDL_WaitEventTimeout.

So you mean that it busy waits on your system even with the frame limiter enabled. The idea is to avoid wasting too much time but to give other threads a chance to run. But Windows has always been a bit iffy here - so I will try out your idea and see how well it works (with the minor change of specifying the real value in frameLimiter_end() rather than changing the value in the function).

So it turns out using the 1ms sleep reduced the accuracy of the frame limiter too much, but with some minor modifications I was able to restore it. The change has been submitted and will be in the next release.

@luciusDXL You also need to address the midi thread, it maxes out a thread as well and doesn't even do sleep(0) just loops continuously.

Thanks luciusDXL. I will try it when I have some time to spare.

I was slightly involved with a similar adjustment in another game port, some months ago.
There it was implemented like this:

https://github.com/LostArtefacts/TR1X/pull/1002/files

@luciusDXL You also need to address the midi thread, it maxes out a thread as well and doesn't even do sleep(0) just loops continuously.

I saw your PR and do plan on going through that in the near future. But this was a fairly simple change, so I wanted to at least improve the frame limiter issue before the next release (which should be any day now).

Thanks luciusDXL. I will try it when I have some time to spare.

I was slightly involved with a similar adjustment in another game port, some months ago. There it was implemented like this:

https://github.com/LostArtefacts/TR1X/pull/1002/files

I had considered that approach, but wasn't sure about the quality of the timing, which is why I had the sleep(0) originally. But its probably better than I'm thinking, so I will give it a shot after the next release is out. Thanks for the link and feedback. :)

Anyway, the plan is to keep this issue open until improvements are made to the midi thread, as mentioned above.

I just tried the recent commit. Since it was very simple to implement.
273dd41
That "DBL_EPSILON + 0.001;" reminds me of the +0.1 I did here.
LostArtefacts/TR1X#985

With the recent commit I get on average 45% total CPU usage among 4 cores. Regular drops from 60 FPS to 59. Sometimes 58.

My earlier hackfix: 30% total CPU usage among 4 cores. Less often it drops from 60 FPS to 59. Never drops to 58.
Maybe because my hackfix catches some sleep(0) calls from other places, which it converts to SDL_Delay(1)?

Two other tests, WIN32:

pull request by MIvanchev frameLimiter.cpp + midiPlayer.cpp
#405
On average 27% total CPU usage among 4 cores. Regular drops from 60 FPS to 59. A few times 58.

Pull request by MIvanchev, but only midiPlayer.cpp / Old framelimiter.cpp / My hackfix for sleep() in system.cpp.
On average 7% total CPU usage among 4 cores. Less often it drops from 60 FPS to 59. Never drops to 58.

Edit: That last combi seemed like a keeper. But there was a hanging note in the midi music after 5 minutes of play.

Nice testing @Gerwin2k. The main thread is actually a bit of a debate terrain. My personal conviction is that the limiting shouldn't block events (i.e. sleep / SDL_Delay vs SDL_WaitEventTiemout) but there are numerous counterexamples and I think it's subjective. Anyhow, in my case it's not a significant issue because vsync already ensures a 60 FPS limit. The midi thread is another story and it's easier to fix. I'm surprised you're getting a hanging note because there's detection logic for that.

The core issue here is that SDL_Delay() exposes different behavior on Linux versus Windows. On Linux SDL_Delay() implements a busy wait loop. On Windows SDL_Delay() yields compute time to other threads, either via Sleep() or WaitForSingleObjectEx() (on WinRT, which TFE currently does not target). These are completely different things. SDL2 docs say that SDL_Delay() “Wait[s] a specified number of milliseconds before returning.”, whatever “wait” is supposed to mean. This is one of many reasons why imho SDL2 is mediocre at best. @luciusDXL I am guessing that you want the later, you want to yield compute time to other threads than keep one thread spinning. Spinning does indeed very much impact other threads, even on SMP and hyper threading machines.

Anyhow, imho almost every time you start adding any kinds of hard coded delays, sleeps etc, you are probably doing conceptually something wrong in your synchronization logic. Better use scheduled callbacks and synchronization objects.