Logo by Tokino Kei.

`sentinel`

Sentinel is a small RISC-V CPU (RV32I_Zicsr) written in Amaranth. It implements the Machine Mode privileged spec, and is designed to fit into ~1000 4-input LUTs or less on an FPGA. It is a good candidate for control tasks where a programmable state machine or custom size-tailored core would otherwise be used.

Unlike most RISC-V implementations, Sentinel is microcoded, not pipelined. Instructions require multiple clock cycles to execute. Sentinel is therefore not necessarily a good fit for applications where high throughput/ IPC is required. See below.

Sentinel has been tested against RISC-V Formal and the RISCOF frameworks, and passes both. Once I have added a few extra tests, the core can be considered correct with respect to the RISC-V Formal model. The core is also probably correct with respect to the SAIL golden model.

Why The Name `sentinel`?

I've like the way the word "sentinel" sounds ever since I first learned of the word, either from the title of a book on NJ lighthouses, or on an enemy from an old Sega Genesis RPG. The term has always stuck with me since then, albeit in a much more positive light than "the soldier golems of the forces of Darkness" :). Since "sentinel" means "one who stands watch", I think it's an apt name for a CPU intended to watch over the rest of your silicon, but otherwise stay out of the way. Also, since lighthouses are indeed "Sentinels Of The Shore", I wanted to shoehorn a lighthouse into the logo :).

Getting Started

Sentinel uses:

PDM as its package/dependency manager, and to orchestrate all things you can do with this repo.
m5meta microcode assembler, without which Sentinel would optimize down to ~0 LUTs :).
yosys and nextpnr for size-benchmarking. The user must provide these.
pytest for basic/regression testing.
DoIt as a lower-level dependency-graph aware task orchestrator (called from pdm).
riscv-tests, an older set of unit tests for RISC-V processors. Running these is done via pytest.
RISC-V Formal to verify that desirable properties of Sentinel (such as "instructions write the correct destination") hold for all possible inputs over a bounded number of clock cycles after reset.
RISCOF, the unit test framework that is maintained by RISC-V International themselves. This appears to have originally been derived from the riscv-tests, but is much more comprehensive.

The latter five are only required for development. Additionally for development, a user must provide:

riscv64-unknown-elf-gcc to compile tests from riscv-tests (I'm not sure what the correct way to install the compiler is nowadays, I use 8.3.0.)
SymbiYosys, a driver program for RISC-V Formal.
Boolector, the SMT Solver that RISC-V Formal uses.

RISCOF also requires the SAIL RISC-V emulator. This is a pain to compile, so I provide a Linux binary (and eventually Windows if I can get OCaml to behave long enough. I used to be able to install it just fine :'D!).

A user must first run the following before anything else:

pdm install -G dev -G examples

Use In Amaranth Code

I expect most users to only need to import from sentinel.top. The top-level module of the Sentinel CPU is appropriately named Top:

from sentinel.top import Top

class MySoC(Elaboratable):
    def __init__(self):
        self.cpu = Top()
        ...

    def elaborate(self, plat):
        m = Module()
        m.submodules.cpu = self.cpu
        ...

Top exposes a Wishbone Classic bus, and an irq input pin as the interface to all other modules in an FPGA design. Of course, Top's also has clk and rst lines, which belong to the sync clock domain rather than being directly exposed in Top's Signature. sync is the only clock domain that Sentinel uses.

See the AttoSoC class in examples/attosoc.py for a full working example. A working demo can be generated from this example, as explained below.

Generate A Verilog Core

Using `pdm`/`pyproject.toml` From This Package

This command will generate a core with a Wishbone Classic bus, and clk, rst, and irq input pins (as mentioned above, Sentinel uses a single clock domain):

pdm gen > sentinel.v

On reset, Sentinel begins execution at address `0``. See the CSR section for information on exception handling (including interrupts).

The Wishbone bus uses a block xfer to do a back-to-back memory write an instruction fetch. Otherwise, the wishbone bus will deassert CYC/STB the cycle after receipt of ACK. I may neeed to interface to IP that can't handle block cycles, so I will probably relax the block cycle requirement in the future via an option.

For help, run:

pdm gen -h

From An Installed Package/As A Dependency

If using Sentinel as an installed package, the previous section still applies, except the command is now:

[pdm run] python -m sentinel.gen

If you're using pdm to handle Python dependencies in e.g. a mixed Python/Verilog project, and Sentinel is a one of those Python dependencies, you may wish to use scripts to provide a shortcut for Verilog generation in your pyproject.toml (call = "python -m sentinel.gen" does not work!):

[tool.pdm.scripts]
gen = { call = "sentinel.gen:generate", help="generate Sentinel Verilog file" }

Generate A Demo Bitstream For Lattice iCEstick

pdm demo

For help, run:

pdm demo -h

Run Tests

pdm test

pdm test-quick

The above will invoke pytest and test Sentinel against handcrafted examples, as well as the riscv-test repo binaries. See the README.md in tests/upstream for information on how to refresh the binaries.

Right now (11/5/2023), the difference between test and test-quick is minimal.

Run RISC-V Formal Flow

pdm rvformal-all [-n num_cores]

pdm rvformal test-name

See README.md in tests/formal for more information, including valid/available test names.

Run RISCOF Flow

pdm riscof-all

pdm riscof-override /path/to/test_list.yaml

See README.md in tests/riscof for more information.

List `pdm` Scripts And `doit` Tasks

pdm run --list

pdm doit list [--all] [task]

doit tasks are documented as a courtesy, and to make sure developers/users don't get stuck. I am unsure about doit tasks' stability, so prefer running pdm as a wrapper to doit rather than running doit directly.

Block Diagram

Instruction Cycle Counts

TODO. I need to create a test that gets latency and throughput for each instruction type of the core. Some general observations (as of 11/18/2023), from examining the microcode:

There is room for improvement, even without making the core bigger.
Fetch/Decode takes a minimum of two cycles thanks to Wishbone classic's REQ/ACK handshake taking two cycles.
- When Wishbone ACK is asserted, Decode is taking place.
- The GP file is a synchronous single read port, single write port. Sentinel loads RS1 out of the register file during Decode.
All instructions share the same operation the cycle after ACK/Decode:
- Check for exceptions/interrupts, go to exception handler if so.
- Latch RS1 into the ALU.
- Load RS2 out of the register file, in anticipation for a "simple" instruction.
- Jump to the instruction-specific microcode block.
At minimum, an instruction (addi, or, etc) takes 3 cycles to retire after the initial shared cycles. This means Sentinel instructions have a minimum latency of 6 cycles per instruction (CPI).
Sentinel instructions have a maximum throughput of 4 CPI by overlapping the 2 Fetch/Decode cycles of the next instruction after the initial 3 shared cycles of the current instruction when possible ("pipelining").
- Some instructions overlap one of the Fetch/Decode cycles, some don't overlap either of them. In particular, shift instructions with a nonzero shift count don't pipeline Fetch/Decode. It may be possible to always overlap at least one cycle, but I haven't tweaked the core yet to ensure this is a sound optimization.
Shift instructions need work:
- For a shift of zero, shift-immediate latency is 10 CPI, throughtput 9 CPI. Shift-register latency is 11 CPI, throughput 10 CPI.
- For a shift of nonzero n, shift-immediate and shift-register latency and throughput is 7 + 2*n CPI.
Branch-not-taken latency and throughput is 7 CPI. Branch-taken latency and throughput is 8 CPI.
JAL/JALR latency is 9 CPI, throughput is 7 CPI.
Store latency and throughput is 8 CPI minimum. 2 cycles minimum are spent waiting for Wishbone ACK.
- The core will not release STB/CYC between the store and fetch of the next instruction.
Load latency is 10 CPI minimum, and throughput is 9 CPI. 2 cycles minimum are spent waiting for Wishbone ACK.
- The core will release STB/CYC before fetch of the next instruction.
CSR instructions require an extra Decode cycle compared to all other instructions (to check for legality).
- At minimum, a read of a read-only zero CSR register has a latency of 7 CPI, and a throughput of 6 CPI.
- At maximum, csrrc has a latency of 11 CPI, and a throughput of 10 CPI.
Entering an exception handler requires 5 clocks from the cycle at which the exception condition is detected.
- mret has a latency and throughput of 8 CPI.

CSRs

Sentinel physically implements the following CSRs:

mscratch
mcause
- The core can only physically trigger a subset of defined exceptions:
  - Machine external interrupt
  - Instruction access misaligned
  - Illegal instruction
  - Breakpoint
  - Load address misaligned
  - Store address misaligned
  - Environment call from M-mode
  In particular worth noting:
  - Misaligned accesses are not implemented in hardware.
  - There is no machine timer (a 64-bit counter is a bit too much to ask for right now :(...).
mip
- Only the MEIP bit is implemented. The RISC-V Privileged Spec says:
  
  MEIP is read-only in mip, and is set and cleared by a platform-specific interrupt controller.
  
  The user must provide their own interrupt controller. One simple implementation is to OR all external interrupt sources together, and query each peripheral when MEIP is pending to find which peripherals need attention. This is implemented for the serial and timer peripherals in the attosoc example.
  
  In the future, I may implement the high (platform-specific) 16-bits of mip/mie to make interrupt-handling quicker.
mie
- Only the MEIE bit is implemented.
mstatus
- Only the MPP, MPIE, and MIE bits are implemented.
mtvec
- The BASE is writeable; only the Direct MODE setting is implemented.
mepc

Additionally, the following CSRs are implemented as read-only zero (only the first 5 of the below registers trigger an exception on an attempt to write):

mvendorid
marchid
mimpid
mhartid
mconfigptr
misa
mstatush
mcountinhibit
mtval
mcycle
minstret
mhpmcounter3-31
mhpmevent3-31

All remaining machine-mode CSRs are unimplemented and trigger an exception on any access:

medeleg
mideleg
mcounteren
mtinst
mtval2
menvcfg
menvcfgh
mseccfg
mseccfgh

cr1901 / sentinel

`sentinel`

Why The Name `sentinel`?

Getting Started

Use In Amaranth Code

Generate A Verilog Core

Using `pdm`/`pyproject.toml` From This Package

From An Installed Package/As A Dependency

Generate A Demo Bitstream For Lattice iCEstick

Run Tests

Run RISC-V Formal Flow

Run RISCOF Flow

List `pdm` Scripts And `doit` Tasks

Block Diagram

Instruction Cycle Counts

CSRs

About

Languages

sentinel

Why The Name sentinel?

Getting Started

Use In Amaranth Code

Generate A Verilog Core

Using pdm/pyproject.toml From This Package

From An Installed Package/As A Dependency

Generate A Demo Bitstream For Lattice iCEstick

Run Tests

Run RISC-V Formal Flow

Run RISCOF Flow

List pdm Scripts And doit Tasks

Block Diagram

Instruction Cycle Counts

CSRs

About

Languages

`sentinel`

Why The Name `sentinel`?

Using `pdm`/`pyproject.toml` From This Package

List `pdm` Scripts And `doit` Tasks