jlpteaching / dinocpu

A teaching-focused RISC-V CPU design used at UC Davis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Understanding the use of TreadleTester in CPU simulation

learning-chip opened this issue · comments

Hi, thanks for this wonderful teaching project.

I'd like to understand the scala/Treadle-based simulation engine, as it allows faster prototyping than verilog-based simulation in RocketChip/Chipyard. However, I was having a hard time understanding the simulator source code, in particular CPUTesterDriver.scala and simulate.scala, partly because the Treadle page (https://www.chisel-lang.org/treadle/) has very few doc.

Here are my main questions. Thanks in advance!

1. What benefits does TreadleTester offer over the vanilla ChiselTest?

Looking at the usage of TreadleTester, it looks similar to the vanilla poke(), step() inside ChiselTest's test block:

// Instantiate the simulator
val sourceAnnotation = FirrtlSourceAnnotation(compiledFirrtl)
val simulator = TreadleTester(sourceAnnotation +: optionsManager.toAnnotationSeq)

def initRegs(vals: Map[Int, BigInt]) {
for ((num, value) <- vals) {
simulator.poke(s"cpu.registers.regs_$num", value)
}
}

def initMemory(vals: Map[Int, BigInt]): Unit = {
for ((addr, value) <- vals) {
simulator.pokeMemory(s"cpu.mem.memory", addr, value)
}
}

def run(cycles: Int): Unit = {
while (cycle < cycles && simulator.peek("cpu.pc") != endPC) {
if (cycle % 10000 == 0) println(s"${cycle} cycles simulated.")
simulator.step(1)
cycle += 1
}
}

So, what feature will be missing if I just use the simple ChiselTest?

2. How does the RISC-V binary get loaded into instruction memory?

The relevant code I find is:

// Convert the binary to a hex file that can be loaded by treadle
// (Do this after compiling the firrtl so the directory is created)
val path = if (binary.endsWith(".riscv")) {
s"src/test/resources/c/${binary}"
} else {
s"src/test/resources/risc-v/${binary}"
}
// This compiles the chisel to firrtl
val compiledFirrtl = build(optionsManager, conf)
val endPC = elfToHex(path, hexName)
// Instantiate the simulator
val sourceAnnotation = FirrtlSourceAnnotation(compiledFirrtl)
val simulator = TreadleTester(sourceAnnotation +: optionsManager.toAnnotationSeq)

However, I can't understand how the binary/instructions get passed to the i-mem in CPU, which locates in a totally separate file/module:

class IMemPortIO extends MemPortIO {
val instruction = Output(UInt(32.W))
val ready = Output(Bool())
}

val memory = Mem(math.ceil(size.toDouble/4).toInt, UInt(32.W))
loadMemoryFromFile(memory, memfile)

The loadMemoryFromFile call looks similar to verilog's $readmemb/$readmemh that can load instructions for simulation. But I don't see how it is invoked via the top-level TreadleTester...

3. Can the Treadle Execution Engine be used with other RISC-V testing frameworks?
In particular https://github.com/riscv/riscv-tests and https://github.com/ucb-bar/riscv-torture that provide a more complete functional coverage. From their docs they seem to require verilog simulators like Verilator or Synopsys VCS.

Related question:
https://stackoverflow.com/questions/55587524/simulating-a-cpu-design-written-in-chisel

So, what feature will be missing if I just use the simple ChiselTest?

One useful thing I find is TreadleTester.pokeMemory that can modify the internal memory without requiring an explicit I/O interface (correct?).

Is it true that TreadleTester can poke any signals while ChiselTest can only poke Input() signals? (ref: https://stackoverflow.com/a/59292064)

Great questions! I'll do my best to answer them, but I'm probably not the best resource. A lot of this code was written quickly on a deadline and written about 2 years ago. I'll do my best to remember what I was thinking!

Also, thanks for your interest here!

So, what feature will be missing if I just use the simple ChiselTest?

One useful thing I find is TreadleTester.pokeMemory that can modify the internal memory without requiring an explicit I/O interface (correct?).

Is it true that TreadleTester can poke any signals while ChiselTest can only poke Input() signals? (ref: https://stackoverflow.com/a/59292064)

Yes, I believe that's correct. IIRC, ChiselTest didn't support loadFromMemory but Treadle did. I'm not sure if that's still the case.

It also had a more general simulation interface than the ChiselTest interface. Also, I believe I talked to the developers and they were planning on deprecating ChiselTest and moving towards only using Treadle.

2. How does the RISC-V binary get loaded into instruction memory?

Here's the relevant code:

https://github.com/jlpteaching/dinocpu/blob/main/src/main/scala/memory/base-memory-components.scala#L40

The filename is passed through the configuration object. I believe the file is a text file with a hex word on each line, but I may be misremembering.

3. Can the Treadle Execution Engine be used with other RISC-V testing frameworks?

I don't see why not, at least in theory. The main impediment is that those tests assume that there the proxy-kernel (pk) running underneath to handle I/O and exceptions. The DINOCPU doesn't implement exceptions, though that is something I'm considering for the future :)

Treadle is just an RTL simulator written scala. If your RTL supports the testing frameworks, then I don't see why Treadle wouldn't.

One other thing I'll mention... You're right that the Treadle documentation isn't great. In fact, I think I'm using some internal APIs in my code here :). I figured most things out by using IntelliJ and reading the Treadle source code. Stepping through and doing live introspection was how I figured out most of the "API" that I'm using. I also have found that the Treadle developer (and the Chisel/FIRRTL developers more generally) are incredibly helpful. They even fixed some bugs in the code here for me!

Good luck! And let me know if there are any other questions I can (try to) answer.

Thank you for the thorough reply.

I believe I talked to the developers and they were planning on deprecating ChiselTest and moving towards only using Treadle.

Interesting... The ChiselTest page says that "if you’re fine living on the bleeding edge, give it a try", so I thought they are advocating ChiselTest instead of deprecating it. Also, the TreadleTester page says "it will be one of the standard back-ends available as part of the chisel-testers project", so it appeared to me that users will access TreadleTester via the higher-level ChiselTest interface😂

The filename is passed through the configuration object.

Ah, I found the relevant code, which confused me at first:

val optionsManager = new SimulatorOptionsManager()
if (optionsManager.targetDirName == ".") {
optionsManager.setTargetDirName(s"test_run_dir/$cpuType/$binary$extraName")
}
val hexName = s"${optionsManager.targetDirName}/${binary}.hex"
val conf = new CPUConfig()
conf.cpuType = cpuType
conf.memFile = hexName

So the filename is initially specified by SimulatorOptionsManager, which extends TreadleOptionsManager class (again, undocumented😂). The filename is then passed to CPUConfig, which can initialize a certain type (e.g. pipeline or single-cycle) of CPU module. With the conf variable that contains all necessary CPU parameters (including memory file path), we can build the simulator (equivalent to the DUT c in ChiselTest) by the following calls in CPUTesterDriver.scala:

val compiledFirrtl = build(optionsManager, conf)
val sourceAnnotation = FirrtlSourceAnnotation(compiledFirrtl)
val simulator = TreadleTester(sourceAnnotation +: optionsManager.toAnnotationSeq)

(Again, found no doc on FirrtlSourceAnnotation😂)

The rest of the test process is easy to understrand as the simulator behaves like the DUT c in ChiselTest.

So, my question is, what's the benefit of using a dedicated TreadleOptionsManager class to configure the CPU and test? From my limited Chisel experience, I would simply define a bunch of parameters in the top-level CPU module, and initialize the test following the TreadleTester example:

val s = Driver.emit(() => new MyCPU(myConfiguration))
val tester = TreadleTester(s)
// then just like normal ChiselTest...
tester.poke(...)
tester.peek(...)

What's the limitation of this simple approach? Could you recommend any resources on the coding practice for a complicated chisel project?

those tests assume that there the proxy-kernel (pk) running underneath to handle I/O and exceptions.

Do you mean something like https://github.com/riscv/riscv-pk/? For simple instructions it should be fine then...

I also have found that the Treadle developer (and the Chisel/FIRRTL developers more generally) are incredibly helpful.

Good to know, for more general questions I will post on their GitHub issues :)

I also hit a bug with TreadleTester.poke. Not sure if I should ask here or on Treadle issues.

Basically I tried to modify the internal register state, following this code segment:

def initRegs(vals: Map[Int, BigInt]) {
for ((num, value) <- vals) {
simulator.poke(s"cpu.registers.regs_$num", value)
}
}

My code looks like:

// inside module
val reg = Reg(UInt(32.W))
val regs = Reg(Vec(4, UInt(32.W)))
...
// during test
tester.poke("reg", BigInt(1))
tester.poke("regs_0", BigInt(1))

The full code is https://gist.github.com/learning-chip/f052dea8f83780e98c87c715122e4f8e
(can run in the online notebook of https://github.com/freechipsproject/chisel-bootcamp)

I got the error message treadle.executable.TreadleException: setValue: Cannot find reg in symbol table. But how can I inspect the symbol table then?

Weirdly, pokeMemory with a similar syntax works well:

val mem = Mem(4, UInt(32.W))
...
testerMem.pokeMemory("mem", addr, BigInt(value))

Interesting... The ChiselTest page says that "if you’re fine living on the bleeding edge, give it a try", so I thought they are advocating ChiselTest instead of deprecating it. Also, the TreadleTester page says "it will be one of the standard back-ends available as part of the chisel-testers project", so it appeared to me that users will access TreadleTester via the higher-level ChiselTest interfacejoy

I must be misremembering... they deprecated something...

What's the limitation of this simple approach? Could you recommend any resources on the coding practice for a complicated chisel project?

TBH, that could work. As I mentioned before, I wrote most of this code a couple of years ago (or maybe last year). Treadle, at the time, was quite new. The APIs could have been cleaned up.

I got the error message treadle.executable.TreadleException: setValue: Cannot find reg in symbol table. But how can I inspect the symbol table then?

Aha! This I can answer with confidence!

FIRRTL pretty aggressively optimizes out unused wires and registers. So, if there's either a bug in your code or if you're not implementing the whole execution core it will often optimize out the registers (they are "unused" in its mind). When it does this, my hardcoded values for poking the registers fail.

There are two solutions: 1) The hacky approach is to add printf statements to force the wires to be kept. 2) The correct approach is to use a FIRRTL annotation which tells the compiler not to optimize the wire/register away with dontTouch. Here's an example: https://github.com/jlpteaching/dinocpu-wq21/blob/main/src/main/scala/single-cycle/cpu.scala#L17

The correct approach is to use a FIRRTL annotation which tells the compiler not to optimize the wire/register away with dontTouch.

Hmm... I wrapped the internal regs by:

val reg = dontTouch(Reg(UInt(32.W)))
val regs = dontTouch(Reg(Vec(4, UInt(32.W))))

But still got the same error Cannot find reg in symbol table. Could you see another possible causes?

Updated code: https://gist.github.com/learning-chip/f052dea8f83780e98c87c715122e4f8e , again runs in the bootcamp online notebook.

Another related question: besides Reg, sometimes Mem can also be optimized away by FIRRTL. However, using dontTouch on Mem leads to error:

inferred type arguments [chisel3.Mem[chisel3.UInt]] do not conform to method apply's type parameter bounds [T <: chisel3.Data]

Adding an output port can prevent the Mem from being optimized away (without using dontTouch). The CoreIO module in your project seems to serve this purpose (exposes Mem's IO interface to top-level CPU module). However, say I want to define a CPU module without exposing memory IO port, like this:

class Memory extends Module {
  ....
  val mem = Mem(4, UInt(32.W))  // internal variable
  ...
}
class Cpu extends Module {
  ...
  val memory = Module(new Memory)    // internal variable
  ...
}

In this case, the memory is optimized away, and pokeMemory fails:

val testerError = TreadleTester(Driver.emit(() => new Cpu(outputMem=false)))
testerError.pokeMemory("memory.mem", 0, 1)
// Error: treadle.executable.TreadleException: Error: memory memory.mem.forceWrite(0, 1). memory not found

Full code: https://gist.github.com/learning-chip/43f11fc7c57daff44fdf437ce0151fc5

Is there a correct way to combine dontTouch and Mem?

One important thing to note is that the error you encounter states that dontCare expects a type T <: chisel3.Data, which is Scala's way of saying that T must be a subtype of Data.

Reg is a special case where the object technically isn't a subtype of Data, but its apply() method returns a T <: Data object. So for all practical purposes any instance of val x = Reg(...) can be treated as a Data type, and dontTouch() will work on it.

Mem isn't a Data; its apply() returns a Mem[T <: chisel3.Data]. So dontTouch() will not work on it.

In your example, the Mem does not have any inbound or outbound signals, so from the perspective of dead code elimination it is 'safe' to optimize away since this would not introduce any side effects to the circuit. And, since DCE occurs during the compilation into FIRRTL, you can't really use TreadleTester to poke this memory because DCE would have already happened by this point.

So, you must necessarily connect the Mem to something, usually the module's IO. The good thing about this is that the apply() of an IO returns a T <: Data, so you can just dontTouch the IO, and as long as it's properly wired the Mem should not be optimized away.

So, you must necessarily connect the Mem to something, usually the module's IO. The good thing about this is that the apply() of an IO returns a T <: Data, so you can just dontTouch the IO, and as long as it's properly wired the Mem should not be optimized away.

Thanks, that's a very useful explanation