cr1901 / sentinel

Another size-optimized RISC-V CPU for your consideration.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sentinel Logo. A lighthouse is shining its light on a PCB and computer
  chip. The silicon die of the computer chip is visible. The text "Sentinel"
  in a black and gray gradient stretches in parallel with the lighthouse's beam.
  The text covers the base of the lighthouse and is below the chip.

Logo by Tokino Kei.

main next

sentinel

Sentinel is a small RISC-V CPU (RV32I_Zicsr) written in Amaranth. It implements the Machine Mode privileged spec, and is designed to fit into ~1000 4-input LUTs or less on an FPGA. It is a good candidate for control tasks where a programmable state machine or custom size-tailored core would otherwise be used.

Unlike most RISC-V implementations, Sentinel is microcoded, not pipelined. Instructions require multiple clock cycles to execute. Sentinel is therefore not necessarily a good fit for applications where high throughput/ IPC is required. See below.

Sentinel has been tested against RISC-V Formal and the RISCOF frameworks, and passes both. Once I have added a few extra tests, the core can be considered correct with respect to the RISC-V Formal model. The core is also probably correct with respect to the SAIL golden model.

Why The Name sentinel?

I've like the way the word "sentinel" sounds ever since I first learned of the word, either from the title of a book on NJ lighthouses, or on an enemy from an old Sega Genesis RPG. The term has always stuck with me since then, albeit in a much more positive light than "the soldier golems of the forces of Darkness" :). Since "sentinel" means "one who stands watch", I think it's an apt name for a CPU intended to watch over the rest of your silicon, but otherwise stay out of the way. Also, since lighthouses are indeed "Sentinels Of The Shore", I wanted to shoehorn a lighthouse into the logo :).

Getting Started

Sentinel uses:

  • PDM as its package/dependency manager, and to orchestrate all things you can do with this repo.
  • m5meta microcode assembler, without which Sentinel would optimize down to ~0 LUTs :).
  • yosys and nextpnr for size-benchmarking. The user must provide these.
  • pytest for basic/regression testing.
  • DoIt as a lower-level dependency-graph aware task orchestrator (called from pdm).
  • riscv-tests, an older set of unit tests for RISC-V processors. Running these is done via pytest.
  • RISC-V Formal to verify that desirable properties of Sentinel (such as "instructions write the correct destination") hold for all possible inputs over a bounded number of clock cycles after reset.
  • RISCOF, the unit test framework that is maintained by RISC-V International themselves. This appears to have originally been derived from the riscv-tests, but is much more comprehensive.

The latter five are only required for development. Additionally for development, a user must provide:

  • riscv64-unknown-elf-gcc to compile tests from riscv-tests (I'm not sure what the correct way to install the compiler is nowadays, I use 8.3.0.)
  • SymbiYosys, a driver program for RISC-V Formal.
  • Boolector, the SMT Solver that RISC-V Formal uses.

RISCOF also requires the SAIL RISC-V emulator. This is a pain to compile, so I provide a Linux binary (and eventually Windows if I can get OCaml to behave long enough. I used to be able to install it just fine :'D!).

A user must first run the following before anything else:

pdm install -G dev -G examples

Use In Amaranth Code

I expect most users to only need to import from sentinel.top. The top-level module of the Sentinel CPU is appropriately named Top:

from sentinel.top import Top

class MySoC(Elaboratable):
    def __init__(self):
        self.cpu = Top()
        ...

    def elaborate(self, plat):
        m = Module()
        m.submodules.cpu = self.cpu
        ...

Top exposes a Wishbone Classic bus, and an irq input pin as the interface to all other modules in an FPGA design. Of course, Top's also has clk and rst lines, which belong to the sync clock domain rather than being directly exposed in Top's Signature. sync is the only clock domain that Sentinel uses.

See the AttoSoC class in examples/attosoc.py for a full working example. A working demo can be generated from this example, as explained below.

Generate A Verilog Core

Using pdm/pyproject.toml From This Package

This command will generate a core with a Wishbone Classic bus, and clk, rst, and irq input pins (as mentioned above, Sentinel uses a single clock domain):

pdm gen > sentinel.v

On reset, Sentinel begins execution at address `0``. See the CSR section for information on exception handling (including interrupts).

The Wishbone bus uses a block xfer to do a back-to-back memory write an instruction fetch. Otherwise, the wishbone bus will deassert CYC/STB the cycle after receipt of ACK. I may neeed to interface to IP that can't handle block cycles, so I will probably relax the block cycle requirement in the future via an option.

For help, run:

pdm gen -h

From An Installed Package/As A Dependency

If using Sentinel as an installed package, the previous section still applies, except the command is now:

[pdm run] python -m sentinel.gen

If you're using pdm to handle Python dependencies in e.g. a mixed Python/Verilog project, and Sentinel is a one of those Python dependencies, you may wish to use scripts to provide a shortcut for Verilog generation in your pyproject.toml (call = "python -m sentinel.gen" does not work!):

[tool.pdm.scripts]
gen = { call = "sentinel.gen:generate", help="generate Sentinel Verilog file" }

Generate A Demo Bitstream For Lattice iCEstick

pdm demo

For help, run:

pdm demo -h

Run Tests

pdm test

or

pdm test-quick

The above will invoke pytest and test Sentinel against handcrafted examples, as well as the riscv-test repo binaries. See the README.md in tests/upstream for information on how to refresh the binaries.

Right now (11/5/2023), the difference between test and test-quick is minimal.

Run RISC-V Formal Flow

pdm rvformal-all [-n num_cores]

or

pdm rvformal test-name

See README.md in tests/formal for more information, including valid/available test names.

Run RISCOF Flow

pdm riscof-all

or

pdm riscof-override /path/to/test_list.yaml

See README.md in tests/riscof for more information.

List pdm Scripts And doit Tasks

pdm run --list
pdm doit list [--all] [task]

doit tasks are documented as a courtesy, and to make sure developers/users don't get stuck. I am unsure about doit tasks' stability, so prefer running pdm as a wrapper to doit rather than running doit directly.

Block Diagram

Simplified block diagram of Sentinel. Black arrows are physical connections. Blue arrows represent microcode ROM outputs to Sentinel components, including feedback into the microcode ROM as inputs. Purple arrows represent microcode ROM inputs from the other components.

Instruction Cycle Counts

TODO. I need to create a test that gets latency and throughput for each instruction type of the core. Some general observations (as of 11/18/2023), from examining the microcode:

  • There is room for improvement, even without making the core bigger.
  • Fetch/Decode takes a minimum of two cycles thanks to Wishbone classic's REQ/ACK handshake taking two cycles.
    • When Wishbone ACK is asserted, Decode is taking place.
    • The GP file is a synchronous single read port, single write port. Sentinel loads RS1 out of the register file during Decode.
  • All instructions share the same operation the cycle after ACK/Decode:
    • Check for exceptions/interrupts, go to exception handler if so.
    • Latch RS1 into the ALU.
    • Load RS2 out of the register file, in anticipation for a "simple" instruction.
    • Jump to the instruction-specific microcode block.
  • At minimum, an instruction (addi, or, etc) takes 3 cycles to retire after the initial shared cycles. This means Sentinel instructions have a minimum latency of 6 cycles per instruction (CPI).
  • Sentinel instructions have a maximum throughput of 4 CPI by overlapping the 2 Fetch/Decode cycles of the next instruction after the initial 3 shared cycles of the current instruction when possible ("pipelining").
    • Some instructions overlap one of the Fetch/Decode cycles, some don't overlap either of them. In particular, shift instructions with a nonzero shift count don't pipeline Fetch/Decode. It may be possible to always overlap at least one cycle, but I haven't tweaked the core yet to ensure this is a sound optimization.
  • Shift instructions need work:
    • For a shift of zero, shift-immediate latency is 10 CPI, throughtput 9 CPI. Shift-register latency is 11 CPI, throughput 10 CPI.
    • For a shift of nonzero n, shift-immediate and shift-register latency and throughput is 7 + 2*n CPI.
  • Branch-not-taken latency and throughput is 7 CPI. Branch-taken latency and throughput is 8 CPI.
  • JAL/JALR latency is 9 CPI, throughput is 7 CPI.
  • Store latency and throughput is 8 CPI minimum. 2 cycles minimum are spent waiting for Wishbone ACK.
    • The core will not release STB/CYC between the store and fetch of the next instruction.
  • Load latency is 10 CPI minimum, and throughput is 9 CPI. 2 cycles minimum are spent waiting for Wishbone ACK.
    • The core will release STB/CYC before fetch of the next instruction.
  • CSR instructions require an extra Decode cycle compared to all other instructions (to check for legality).
    • At minimum, a read of a read-only zero CSR register has a latency of 7 CPI, and a throughput of 6 CPI.
    • At maximum, csrrc has a latency of 11 CPI, and a throughput of 10 CPI.
  • Entering an exception handler requires 5 clocks from the cycle at which the exception condition is detected.
    • mret has a latency and throughput of 8 CPI.

CSRs

Sentinel physically implements the following CSRs:

  • mscratch
  • mcause
    • The core can only physically trigger a subset of defined exceptions:

      • Machine external interrupt
      • Instruction access misaligned
      • Illegal instruction
      • Breakpoint
      • Load address misaligned
      • Store address misaligned
      • Environment call from M-mode

      In particular worth noting:

      • Misaligned accesses are not implemented in hardware.
      • There is no machine timer (a 64-bit counter is a bit too much to ask for right now :(...).
  • mip
    • Only the MEIP bit is implemented. The RISC-V Privileged Spec says:

      MEIP is read-only in mip, and is set and cleared by a platform-specific interrupt controller.

      The user must provide their own interrupt controller. One simple implementation is to OR all external interrupt sources together, and query each peripheral when MEIP is pending to find which peripherals need attention. This is implemented for the serial and timer peripherals in the attosoc example.

      In the future, I may implement the high (platform-specific) 16-bits of mip/mie to make interrupt-handling quicker.

  • mie
    • Only the MEIE bit is implemented.
  • mstatus
    • Only the MPP, MPIE, and MIE bits are implemented.
  • mtvec
    • The BASE is writeable; only the Direct MODE setting is implemented.
  • mepc

Additionally, the following CSRs are implemented as read-only zero (only the first 5 of the below registers trigger an exception on an attempt to write):

  • mvendorid
  • marchid
  • mimpid
  • mhartid
  • mconfigptr
  • misa
  • mstatush
  • mcountinhibit
  • mtval
  • mcycle
  • minstret
  • mhpmcounter3-31
  • mhpmevent3-31

All remaining machine-mode CSRs are unimplemented and trigger an exception on any access:

  • medeleg
  • mideleg
  • mcounteren
  • mtinst
  • mtval2
  • menvcfg
  • menvcfgh
  • mseccfg
  • mseccfgh

About

Another size-optimized RISC-V CPU for your consideration.

License:BSD 2-Clause "Simplified" License


Languages

Language:Python 76.9%Language:Assembly 11.0%Language:C 6.3%Language:Rust 3.3%Language:SystemVerilog 2.5%Language:RPC 0.1%