VexRiscV - SlimoptCfu does not appear to be working
alanvgreen opened this issue · comments
I have a minimal Cfu and with SlimOptCfu on HPS Proto2, the result appears to be always wrong:
/* Generated by Yosys 0.12+13 (git sha1 21fbdb663, clang 11.1.0-4+build2 -fPIC -Os) */
(* \nmigen.hierarchy = "Cfu" *)
(* top = 1 *)
(* generator = "nMigen" *)
module Cfu(cmd_ready, cmd_payload_function_id, cmd_payload_inputs_0, cmd_payload_inputs_1, rsp_valid, rsp_ready, rsp_payload_outputs_0, reset, port0_addr, port1_addr, port2_addr, port3_addr, port0_din, port1_din, port2_din, port3_din, clk, rst, cmd_valid);
reg \$auto$verilog_backend.cc:2083:dump_module$2 = 0;
(* src = "/usr/local/google/home/avg/src/CFU-Playground/third_party/python/nmigen/nmigen/hdl/ir.py:524" *)
input clk;
(* src = "/usr/local/google/home/avg/src/CFU-Playground/python/nmigen_cfu/cfu.py:182" *)
input [9:0] cmd_payload_function_id;
(* src = "/usr/local/google/home/avg/src/CFU-Playground/python/nmigen_cfu/cfu.py:184" *)
input [31:0] cmd_payload_inputs_0;
(* src = "/usr/local/google/home/avg/src/CFU-Playground/python/nmigen_cfu/cfu.py:185" *)
input [31:0] cmd_payload_inputs_1;
(* src = "/usr/local/google/home/avg/src/CFU-Playground/python/nmigen_cfu/cfu.py:181" *)
output cmd_ready;
(* src = "/usr/local/google/home/avg/src/CFU-Playground/python/nmigen_cfu/cfu.py:180" *)
input cmd_valid;
(* src = "/usr/local/google/home/avg/src/CFU-Playground/proj/hps_accel/gateware/gen2/hps_cfu.py:32" *)
reg f = 1'h0;
(* src = "/usr/local/google/home/avg/src/CFU-Playground/proj/hps_accel/gateware/gen2/hps_cfu.py:32" *)
reg \f$next ;
(* src = "/usr/local/google/home/avg/src/CFU-Playground/python/nmigen_cfu/cfu.py:190" *)
input [31:0] port0_addr;
(* src = "/usr/local/google/home/avg/src/CFU-Playground/python/nmigen_cfu/cfu.py:191" *)
input [31:0] port0_din;
(* src = "/usr/local/google/home/avg/src/CFU-Playground/python/nmigen_cfu/cfu.py:190" *)
input [31:0] port1_addr;
(* src = "/usr/local/google/home/avg/src/CFU-Playground/python/nmigen_cfu/cfu.py:191" *)
input [31:0] port1_din;
(* src = "/usr/local/google/home/avg/src/CFU-Playground/python/nmigen_cfu/cfu.py:190" *)
input [31:0] port2_addr;
(* src = "/usr/local/google/home/avg/src/CFU-Playground/python/nmigen_cfu/cfu.py:191" *)
input [31:0] port2_din;
(* src = "/usr/local/google/home/avg/src/CFU-Playground/python/nmigen_cfu/cfu.py:190" *)
input [31:0] port3_addr;
(* src = "/usr/local/google/home/avg/src/CFU-Playground/python/nmigen_cfu/cfu.py:191" *)
input [31:0] port3_din;
(* src = "/usr/local/google/home/avg/src/CFU-Playground/python/nmigen_cfu/cfu.py:189" *)
input reset;
(* src = "/usr/local/google/home/avg/src/CFU-Playground/python/nmigen_cfu/cfu.py:188" *)
output [31:0] rsp_payload_outputs_0;
(* src = "/usr/local/google/home/avg/src/CFU-Playground/python/nmigen_cfu/cfu.py:187" *)
input rsp_ready;
(* src = "/usr/local/google/home/avg/src/CFU-Playground/python/nmigen_cfu/cfu.py:186" *)
output rsp_valid;
(* src = "/usr/local/google/home/avg/src/CFU-Playground/third_party/python/nmigen/nmigen/hdl/ir.py:524" *)
input rst;
always @(posedge clk)
f <= \f$next ;
always @* begin
if (\$auto$verilog_backend.cc:2083:dump_module$2 ) begin end
\f$next = 1'h1;
(* src = "/usr/local/google/home/avg/src/CFU-Playground/third_party/python/nmigen/nmigen/hdl/xfrm.py:519" *)
casez (rst)
1'h1:
\f$next = 1'h0;
endcase
end
assign rsp_valid = 1'h1;
assign rsp_payload_outputs_0 = 32'd305419896;
assign cmd_ready = 1'h1;
endmodule
Every result received by the CPU is 0x13355779
This was the nmigen used to generate the Cfu.
from nmigen import Signal
from nmigen_cfu import CfuBase
from .constants import Constants
class Cfu(CfuBase):
"""Gen2 accelerator CFU.
Assumes working with a slimopt+cfu VexRiscV, which rsp_ready is always true.
"""
def elab(self, m):
# for now, everything completes in a single cycle, so can be always ready
m.d.comb += self.cmd_ready.eq(1)
m.d.comb += self.rsp_out.eq(0x12345678)
m.d.comb += self.rsp_valid.eq(1)
f = Signal()
m.d.sync += f.eq(1)
def make_cfu():
return Cfu()
The 'f' signal exists to ensure that a clk signal is generated.
I haven't had time to isolate the bug better yet, but the above failing code is available at https://github.com/alanvgreen/CFU-Playground/blob/slimopt-bug/proj/hps_accel/gateware/gen2/hps_cfu.py
I have verified that the above Cfu works as expected when the default VexRiscV CPU is used.
Hi @alanvgreen , did it meet timing?
The CFU w/ slimopt+cfu worked correctly for me on Arty. Build command:
make EXTRA_LITEX_ARGS="--cpu-variant=slimopt+cfu" clean prog
I'll try proto2 now.
On proto2, I met timing by a ways (100MHz+), and I do see a symptom similar to yours: result is 0x12345778
.
And on proto2, with slim+cfu
and your CFU, I get the correct 0x12345678
.
When I build slimopt+cfu
using Radiant, I see the correct answer 0x12345678
.
Ran again - confirmed it meets timing (102.87 MHz). Still producing incorrect output - see uart output below.
Hello, World!
CFU Playground
==============
1: TfLM Models menu
2: Functional CFU Tests
3: Project menu
4: Performance Counter Tests
5: TFLite Unit Tests
6: Benchmarks
7: Util Tests
8: SPI Flash Debugging
main> 3
Running Project menu
Project Menu
============
p: Ping CFU
v: Verify register tests
L: test layer 20
x: eXit to previous menu
project> v
Running Verify register tests
input = c57b29a7, output = 13355779 - FAIL
input = 2b3d0954, output = 13355779 - FAIL
input = 02354ca2, output = 13355779 - FAIL
input = 3727d2b9, output = 13355779 - FAIL
input = 7963b9c5, output = 13355779 - FAIL
input = e3c5de35, output = 13355779 - FAIL
input = f2886d5b, output = 13355779 - FAIL
input = 46f99ee6, output = 13355779 - FAIL
input = b1257a48, output = 13355779 - FAIL
input = d8184f02, output = 13355779 - FAIL
FAIL: 10 failures out of 10 tests
---
I ran with slimopt+cfu, default toolchain, on proto2, with the only change that I specified the 7- grade part LIFCL-17-7UWG72C
.
With this run, I matched @alanvgreen 's incorrect result exactly, 0x13355779
.
Some other experiments I ran --
- changing clock divider up and down
- changing timing fudge factor (currently 1.1) hight and lower
Results -- always a similar error, an LSB of a hex digit being 1'b1 when it should be 1'b0.
Another experiment -- I tried to reproduce on Fomu with the same CPU/CFU, and a similar tool chain -- I did not see the error with Fomu.
I did another experiment with Yosys synthesis and Radiant PnR. The results are even stranger.
make PLATFORM=hps EXTRA_LITEX_ARGS="--cpu-variant=slimopt+cfu --toolchain=radiant --synth-mode=yosys" clean bitstream
Output for interactive test:
Running Run interactive tests
CFU Interactive Test:
First operand value > 0
Second operand value > 0
cfu_op0(00000000, 00000000) = 13355009 (hex), 322261001 (signed), 322261001 (unsigned)
cfu_op1(00000000, 00000000) = 13355679 (hex), 322262649 (signed), 322262649 (unsigned)
cfu_op2(00000000, 00000000) = 13355009 (hex), 322261001 (signed), 322261001 (unsigned)
cfu_op3(00000000, 00000000) = 13355679 (hex), 322262649 (signed), 322262649 (unsigned)
cfu_op4(00000000, 00000000) = 13355009 (hex), 322261001 (signed), 322261001 (unsigned)
cfu_op5(00000000, 00000000) = 13355679 (hex), 322262649 (signed), 322262649 (unsigned)
cfu_op6(00000000, 00000000) = 13355009 (hex), 322261001 (signed), 322261001 (unsigned)
cfu_op7(00000000, 00000000) = 13355679 (hex), 322262649 (signed), 322262649 (unsigned)
So with the CFU outputting constant 0x12345678, the output as read by the CPU alternates between two bad values -- and now we sometime see bits incorrectly forced to 1'b0 (before we only saw bits being corrupted to 1'b1). The alternation is not just random -- in all the tests, every successive result flips to the other one.
I ran Radiant again full flow (2.2.1), and saw correct behavior.
Another clue: with the full Oxide flow (yosys/nextpnr-nexus) on this design, I get the correct behavior if I add -noflatten
to the arguments to synth_nexus
in the Yosys script hps_proto2_platform.ys
.
So I suspect that constant propagation / folding from the CFU constant value into the VexRiscv receiving FIFO, particularly across flop boundaries, triggers the issue. This receiving FIFO only exists in the slimopt
variant.