Game testing, Functional test

Question

Game testing, Functional test

opened this issue 7 years ago · comments

Currently, the functional testing of a game is done manually. The goal of this issue is to take a game project from source code to playing it, in an automated fashion. It seems like a simple thing, but it can tell you if all is good or things have changed, perhaps for the worse.

How to do this?

The entire processing needs to be driven by a cross platform script of some sort, so it can be done by the user, at will. The current plan is to create an AtomicEditor plugin, where you can select a list projects to build and run.

Then, enhance AtomicTool to the heavy lifting, it can already do most of the pieces, but needs a command to just do it, without argument(s), and all languages. Use either the AtomicTool player, or platform native game to test.

In order to play the game, add an event record/playback feature into the Input module, like the demo mode in Quake, to make the game do something.

For results reporting, use existing logs and logging features to tell what happened.

Matt Benic · Answer 1 · Thu Oct 12 2017 19:59:53 GMT+0800 (China Standard Time)

This needs breaking up into smaller tasks. It is arguably a project of its own, at least the game part. The scripted/recorded input part might be something that belongs in the engine.

Alan · Answer 2 · Thu Oct 12 2017 22:20:30 GMT+0800 (China Standard Time)

problem with this is that gameplay isn't deterministic when you use varying delta, which is the case for almost all games, nor physengines are deterministic, so we can't just record and reproduce inputs, we'd need to trigger inputs according to the game state with a bit of tolerance, but even though it's not guaranteed not to fail. What we could do is simpler tests, for example, does the player move to a given direction when input is emulated, does it jump, does it lose health when touch the enemy, etc.; but I'm afraid these tests are largely useless. What games use for 'replay' these days is just position recording what we clearly cannot use here.

Deleted user · Answer 3 · Fri Oct 13 2017 00:20:16 GMT+0800 (China Standard Time)

Certainly, @Alan-FGR that be a really nice problem to have, and would need more work to handle your use case, this issue is meant to provide a framework to allow testing. But if the community has no interest in completing anything, or trying to maintain it, and are happy to have their users find their bugs for them, you are correct, this Issue does not need to be completed.

Alan · Answer 4 · Fri Oct 13 2017 00:23:04 GMT+0800 (China Standard Time)

@JimMarlowe What? Sorry but you totally misunderstood me... I'm just saying that this isn't simple by any means, and the only easy way to implement something like that wouldn't be very useful, so we have to come up with a proper solution what's not easy feat, because there's no way to hack our way with this one... this certainly would be useful, while it's not high priority imho.

Alan · Answer 5 · Fri Oct 13 2017 07:04:42 GMT+0800 (China Standard Time)

@JimMarlowe please clarify: are you talking about something that will allow us to know whether for example all enemies are killable and the level is finishable, or just something that will tell us if the level can crash the game or the player can move out of the bounds, etc?

Matt Benic · Answer 6 · Fri Oct 13 2017 15:50:49 GMT+0800 (China Standard Time)

A general state testing harness (type thing.. not sure I know what the correct term for this would be) is something my team wants to build at some point for our own testing purposes. So it's possible that part would be something we can implement in core atomic. This is another example (like the network cache) of something that wouldn't necessarily serve the average indie using the engine, but would be more useful to bigger teams. It wouldn't hurt anyone though ;)

Alan · Answer 7 · Fri Oct 13 2017 20:57:08 GMT+0800 (China Standard Time)

Meaningful automated gameplay testing is extremely difficult to achieve. I'm not saying it shouldn't be done, but this is potentially an independent product and certainly a much bigger project than Atomic itself.

Matt Benic · Answer 8 · Fri Oct 13 2017 21:02:38 GMT+0800 (China Standard Time)

We wouldn't be looking at full gameplay testing, but likely testing for the firing of specific events externally. That would likely be able to achieve something very similar to full testing though-or ba a basis to build on if someone wanted to extend it to that level of testing. Either way, it's something we need at some point so will likely work on it.

Brian Ewing · Answer 9 · Mon Oct 23 2017 10:20:10 GMT+0800 (China Standard Time)

Acceptance testing similar to the Capybara model could work, simulating user interaction and polling for expected conditions within a certain time frame

An example of a high level test (specified in Gherkin) might be,

Given I am playing the first scene
When I move forward for 3 seconds
Then I should be in front of a chest
When I press the action button
Then the chest opens
And I should have been given a key

When I turn 90 degrees right
And I move forward for 3 more seconds
Then I should be in front of a door

When I press the action button
Then the scene should quit

Gherkin provides a structure for defining tests as a series of reusable imperative steps (parsed with regex :P), that you can sort of compose as if you were just using natural language. Just imagine it's pseudocode if that makes your skin crawl 😄

You could test all sorts of features and behaviour this way and save yourself a lot of time. You'd have some sort of assurance or at least a litmus test when making changes that might affect other levels and interactions. I imagine this would disproportionately benefit smaller developers, offloading a lot of routine / repetitive qa and verification work to the computer instead of doing it by hand with every change

A couple of points:

Obviously 'move forward for 3 seconds' is completely imprecise. You could imagine a few different ways of getting to where you want to go, maybe "walk towards the chest" or "teleport beside some_magical_entity".

If the test runner polls for conditions like "I'm standing in front of [x]" or "more than 3 feet in the air", for a given number of seconds, you can test responses to user actions without needing frame-by-frame determinism

Waiting for every action to complete, the test suite is gonna take a long time without either running scenarios in parallel, or finding some way to 'speed up' the game, I'm relatively new to game development but I imagine that would be possible

Alan · Answer 10 · Mon Oct 23 2017 10:53:32 GMT+0800 (China Standard Time)

Very good points @brianewing. Using timed events is certainly not a good option, those 3 seconds you mention for example, sometimes are going to be slightly more, sometimes slightly less, so when it's long chains, e.g.: walk for 3 seconds, turn right for 1 second, walk another 3 seconds, the errors accumulate rather quickly so the player will end up at a different position each run. You already pointed out the solution to that, so for example the player walks towards the chest and when it can interact with it, a button press is sent, etc. that certainly will work fine for an RPG for example, when it comes to action games though, some awareness of the surroundings is necessary.

Brian Ewing · Answer 11 · Mon Oct 23 2017 11:17:13 GMT+0800 (China Standard Time)

@Alan-FGR How would you formulate that awareness? 🙂

Alan · Answer 12 · Mon Oct 23 2017 11:44:55 GMT+0800 (China Standard Time)

Well, say for example a game with platforms to jump and enemies to kill, and consider the fact that you're often changing that during development (adding/removing platforms and enemies), so you can't hardcode the actions of the player, otherwise that won't be much useful and you end up with this... for each object you change slightly you'd have to change the logic, what's not a good solution, basically the only approach that's general enough to be a 'product' and still provide value is some kind of AI in my opinion, I mean, it's acceptable to lay some waypoints on the level for testing purposes, but not script some testing behavior for some small change the level designer decided to do. When you're playing a purely action game, you're basically just responding to visual and auditive stimuli, which is not the easiest thing to reproduce by software, and there are so many complex scenarios, say for example, determining how easy is to spot a given enemy on a specific position (considering foreground/background objects) or to hear a sound (considering other environment sounds). There's a reason why manual testing is still a thing in the industry, and it's not from the benevolence of the butcher .
What I'd really like to see is some modern AI based on neural networks and an arbitrary fitness score in which you just play the level a few times and based on that it automatically playtests it with small variations, something like continuous automated playtesting. If it's fast enough and properly configured, changes shouldn't be a problem since it's incremental.

Brian Ewing · Answer 13 · Mon Oct 23 2017 12:31:44 GMT+0800 (China Standard Time)

That's a really interesting idea! You could have a Super Meat Boy style play back of the AI's test runs to debug a failure. Maybe store all of them historically to get a laugh looking back through the old iterations...

AI testing sounds pretty awesome, imagine if you could train a network to the point that it could adapt and learn to complete levels as you change / create them. With a sufficiently well trained and flexible AI it's bound to have at least some parity with how users will interpret new challenges which would be a really useful metric in its own right

For acceptance or functional testing you're totally right that it would be pointless to test a level with so much precision that it breaks when the design changes. You'd need to find a way to isolate what you need.. maybe construct scenes specifically for testing. Or be happy to zip around a scene poking only the things you want to look at, ignoring puzzles etc

In web acceptance testing you tend not to say "click at these coordinates", instead you "press such and such a button", preferring high level descriptions of what you want and glossing over any unnecessary details... I think the technique might transfer over well to testing games

AI could prove a really powerful tool for doing that and not just hiding sequential steps with encapsulation.. though I do think there's a LOT you can do with that if you're clever with what you choose to abstract, test and (most importantly) ignore

Deleted user · Answer 14 · Wed Nov 22 2017 01:45:55 GMT+0800 (China Standard Time)

Clarification: Say you have a body of work, and you are required to maintain and support it. When Atomic gets PR'd, there is currently no testing, so the game developer has to go and try each body of work to see if still works. If it doesn't, that means you have to use a previous version of Atomic for maintenance. And if you need further Atomic changes, you are hosed.
The game developers need a way of (minimally) taking their source code and in an automated fashion, have the source built and either "played" in an editor instance or build to a platform instance and at least be able to start the game. The chances are really low that a game developer would be to do this, hence this Issue.
If you were able to get your game from source to live instance, the event record/playback is rudimentary, though it is a step above Urho's testing of bringing the examples up and just killing them. With the event playback, you can operate UI (if you have it) and do monkey testing, anything non-deterministic will need a lot more.