m8pple / arch2-2019-cw

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Testbench question

dangerbot3pic opened this issue · comments

The spec says that each row of the output file corresponds to one execution of the simulator under test. Does this mean that each instruction in a multi-instruction binary needs to be checked or can we be testing for a particular instruction in a multi-instruction binary and check whether it is correct at the end?

Each row needs to correspond to one execution, and corresponds to one test-case
that is intended to check a particular instruction. As you say, there will usually be multiple
instructions in a binary, of which only one is the target. However, if the other instructions
don't work, then the test-case is going to fail, even if the target did work.

So as is generally the case when testing, it is very difficult to prove anything
definitive about what works when tests pass, and it is also very difficult to
say exactly what caused a failure when a given test-case failed.

So a particular row saying:

0, ADDU, Pass, dt10

is only saying "The ADDU instruction has the correct behaviour for some small set
of stimulus, or it might have failed in a way that looks correct." Your goal is to
try to design tests that make it very unlikely that it failed and still managed to
pass the test.

Equivalently, another row saying:

2, ADDU, Fail, dt10

is only saying "At least one instruction in this test-case had the wrong behaviour",
but it isn't able to definitively prove that it was the ADDU that is broken - it might
be the JR instruction that doesn't work. Again, your job is to try to make it most
likely that the thing that caused the test to fail is the ADDU, and try to select other
instructions that are likely to succeed (if an implementation doesn't have working
JR, then they have problems).

In principle, you could take one binary and use it multiple times, listing each
instruction that it actually executes. That would be correct, but not really very
focused, as if ADDU failed, then it would also claim that JR failed.

Try to think of it in this way: you are trying to design tests that can:

  • Correctly identify when a processor executes the target instruction correctly.
  • Correctly identify when a processor is completely correct except for just the
    one instruction you are testing.

The latter condition won't be true when testing your classmate's simulators
(as often more than one instruction will be broken), but a simulator can be
constructed that does have exactly one broken instruction that happens to
match what you are testing.