m8pple / arch2-2019-cw

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Testbench markscheme

ES5017 opened this issue · comments

Can you provide some information as to how the test bench will be marked? Will the quality of the script be taken into consideration, or is it purely the sets of mips instructions we come up with?

commented

This is a bit tricky. I can't tell you exactly what I'm going to test as I don't want to see 32 testbenches written to just suit the testbench testing; that would defeat the point of writing them in the first place. But here are some things for you to consider.

  • Does your testbench exercise all instructions?
  • Does it test control and data corner cases?
  • Does it correctly interpret return codes (and I/O)?
  • Is it easy to understand and extend?

As stated in the spec,

This should identify the instruction which is the primary instruction being tested. Note that many (actually, most) instructions are impossible to test in isolation, so a given test may fail either because the instruction under test doesn't work, or because some other instruction necessary for the tests is broken.

If a necessary instruction which is widely used in many tests is broken, and then these tests fail not due to the primary testing instruction. Will this be considered as a bad testbench and receive low marks?

commented

I'm not following the logic here. When testing a testbench, how is whether or not a particular instruction is "broken" (which is a property of the simulator under test) a reflection of how good the testbench is?

For example, the testbench used in the formative

10000000 <_entry>:
10000000: 3c027fff lui v0,0x7fff
10000004: 3442ffff ori v0,v0,0xffff
10000008: 3c037fff lui v1,0x7fff
1000000c: 3463ffff ori v1,v1,0xffff
10000010: 00631020 add v0,v1,v1
10000014: 03e00008 jr ra
10000018: 00000000 nop
1000001c: 00000000 nop

The primary testing instruction here is ADD. If the LUI instruction of the simulator is broken, but the ADD instruction is actually working well, we still will get the wrong exit code in $2, which leads to a fail of the ADD test.

If the LUI instruction is also used in all other testing codes, then every result will be a fail.

So for such a testbench that cannot reflect the true result of primary testing instruction, will it be considered as a bad testbench?

commented

I'm concerned by the implication of the clause "If the LUI instruction of the simulator is broken," so let me first of all make sure that you're not under the impression that your simulator will be used for the evaluation of your testbench. It won't because it can't: I don't have control over its behaviour.

With that cleared up, using the "ADD test" as an example, if any of the instructions used (LUI, ORI, ADD or JR) don't function correctly, the test may fail. This is expected and desired behaviour.

That the same instruction(s), e.g. LUI, may be used in multiple tests is a separate issue. To some extent, you can't help this: sometimes there's only one way of doing something. In place of LUI you could use ADDI and SLL in some tests, which would allow more tests to pass if LUI were unimplemented in the simulator under test. This (achieving lots of passes) isn't the goal of the testbench, however: its job is to achieve high coverage of potential failures.

In the previous conversation regarding this issue, @jjd06 mentioned :

  • Is it easy to understand and extend?
  • In which case, are we allowed to write file such as "readme.md" to explain how does our testbench work, and how to extend?

Could you provide more information.

commented

That would certainly help, but I'll be looking more at implementation. If, for example, adding tests requires binaries to be assembled by hand, or your testing script is a huge list of sequential statements, that would suggest that your testbench isn't very extensible.