wbxbar slave-side STB remains asserted beyond ACK for one clock cycle for single word transaction.

Question

wbxbar slave-side STB remains asserted beyond ACK for one clock cycle for single word transaction.

epsilon537 opened this issue 8 months ago · comments

I'm debugging an issue where LiteDRAM's Wishbone slave port locks up when attached to wbxbar.
The waveform shows that for an individual transaction, wbxbar's STB signal remains asserted for one clock cycle after the slave's ACK got deasserted. The slave interprets this as a new phase in a block cycle. The next clock cycle wbxbar aborts the transaction (deasserting CYC and STB) causing the slave to hang.
In the waveform you see on the slave port:

The clock cycle before the cursor, STB is asserted and STALL is asserted.
On the cursor, STALL is deasserted, ACK and STB are asserted.
the next clock cycle shows ACK deasserted but STB still asserted.

Here is a snippet from the WB B4 spec describing how STB remaining asserted beyond the ACK pulse initiates a 2nd phase.

Here is my instantiation of the wbxbar:

localparam AW=28; //Bus address width
localparam DW=32; //Bus data width
localparam NUM_XBAR_MASTERS = 5;
localparam NUM_XBAR_SLAVES = 8;

wbxbar #(
.NM(NUM_XBAR_MASTERS), .NS(NUM_XBAR_SLAVES),
.AW(AW), .DW(32),
.SLAVE_ADDR(XBAR_SLAVE_ADDRS),
.SLAVE_MASK(XBAR_SLAVE_ADDR_MASKS),
.LGMAXBURST(3),
.OPT_DBLBUFFER(1'b0),
.OPT_LOWPOWER(1'b0))
wb_xbar (...)

Dan Gisselquist · Answer 1 · Mon Dec 18 2023 22:02:07 GMT+0800 (China Standard Time)

This crossbar doesn't implement WB classic. It implements the WB pipelined, found in the B4 spec of the Wishbone protocol. A request is made therefore every time STB && !STALL, and STB is not allowed to drop if STALL is active (unless CYC also drops ...). STB and ACK are also separate. STB may remain active as long as you wish to continue making requests of the slave (and CYC is high), and you should receive one ACK per cycle where (STB && !STALL).

If you wish to convert between WB classic and WB pipelined, please examine the wbp2classic.v or wbc2pipeline.v

You will likely find that WB pipelined is many times faster than WB classic--especially when accessing any DRAMs, where the speed difference may be 20x or more.

You can read more about the Wishbone properties the crossbar was built to adhere to here.

Dan

Epsilon · Answer 2 · Mon Dec 18 2023 22:48:00 GMT+0800 (China Standard Time)

Thank you for the quick response. I'm also using WB pipelined. As you'll see in the waveform, the WB slave port (user_port) uses a STALL signal. The STALL signal is low when ACK is asserted, so STB is allowed to drop. However, STB does not drop, which causes the slave to reassert STALL the following clock cycle.

Dan Gisselquist · Answer 3 · Mon Dec 18 2023 23:58:55 GMT+0800 (China Standard Time)

When using WB classic, the bus cycle ends and STB drops following an ACK.

When using WB pipelined, a request is made once STB && !STALL. If you hold STB high any longer, you are making a second request. This is entirely independent from the ACK signal. There may be many clock cycles between STB falling and the first ACK return. This is unique to WB pipelined.

Looking over your trace, I'm struggling to follow your naming conventions. Are the user_port_wishbone* signals the incoming ports to the crossbar? If so, I see STB && STALL (request is made, but not yet accepted), STB && !STALL (request is accepted), followed by STB && STALL again (a second request is being made, but not yet accepted). This latter STB request is then dropped (bus abort), suggesting you didn't mean to make two requests? How is the crossbar then wrong for forwarding your two requests downstream and then aborting the second request?

Dan

Epsilon · Answer 4 · Tue Dec 19 2023 17:47:54 GMT+0800 (China Standard Time)

I ran out of time yesterday. Apologies if I wasn't making myself clear.

In the trace below you see 3 signal groups:

The Bus Master issuing a single word transaction, using WB pipelined protocol. The Bus Master is attached to wbxbar port 2.
Wbxbar forwarding this transaction to the slave, attached to port 7.
The relevant Wbxbar input and output ports.

My question is, on the slave port, why is STB still asserted on the clock cycle following the ACK? The Bus Master that initiated the transaction only issued a single transaction (STB && !STALL high for one clock cycle).

Thank you for looking into this.

Dan Gisselquist · Answer 5 · Tue Dec 19 2023 20:11:14 GMT+0800 (China Standard Time)

If you can provide a test bench that recreates this issue, or alternatively the slave address and slave address mask registers, then I should be able to generate a test to duplicate this issue.

Dan Gisselquist · Answer 6 · Tue Dec 19 2023 21:03:11 GMT+0800 (China Standard Time)

Basically, the roadmap forward will be to look at the o_sstb signal and try to back trace it through all of its steps from where it leaves the design (broken) to where it enters the design (according to your trace) to see what's going on. This can be done easily in simulation, since you have access to all of the signals within a simulation. Alternatively, we can do the same using hardware--but that tends to be much harder. Hence, to debug this, I'd like to figure out how to repeat this problem in a way that it can then be debugged.

Dan

Epsilon · Answer 7 · Tue Dec 19 2023 21:54:21 GMT+0800 (China Standard Time)

Agreed. I'll dig in, see what I can find and report back here. It'll be a good exercise.
My current simulation is quite involved, including an Ibex CPU, LiteDRAM memory controller, etc. I'll try to reduce to a simpler testbench that's easier to hand over.

Epsilon · Answer 8 · Thu Dec 21 2023 19:47:23 GMT+0800 (China Standard Time)

I found the issue, or at least part of it. The issue is in my code. Due to a logic error, the bus master was generating an STB pulse of 2 clock cycles wide (in the absence of STALLs) for a single-word transaction. A pretty obvious bug. What threw me for a loop was that you don't see this in the Verilator trace. In the screenshot above (the verilator trace) you see an STB pulse of one clock cycle wide.
When I simulate this bus master using Icarus Verilog, I get a different picture:

So, lesson learned. Sometimes you have to question the sanity of the trace itself. I started to suspect the trace when I noticed that in the Verilator trace, the address decoder's i_valid and o_valid go high on the same clock edge, which doesn't make sense when OPT_REGISTERED is set:

I'm probably doing something wrong in the way that I'm generating the Verilator trace. I haven't figured that out yet.
As you probably expected, wbxbar is working just fine. Thank you for taking the time to respond. I'm closing the case.

Dan Gisselquist · Answer 9 · Fri Dec 22 2023 01:30:52 GMT+0800 (China Standard Time)

Sigh.

Yes, this is one of the problems I've come across with Verilator. I discuss it in my Verilog tutorial.

My standard answer is to update values off-clock tick--perhaps 80% of the way through the clock cycle or some such. This will allow you to "see" what's going on.

A "better" answer that I've been considering would be to do the task properly: 1) calculate values to set any clock-synchronous things to prior to setting the clock high--but don't set them, then 2) set the clock high, 3) call eval(), 4) set your values (i.e. anything synchronous to the clock tick)--but set them based upon what you recorded their values should've been based upon conditions prior to the clock tick, 5) call eval() again, and 6) finally, and only then, step the clock forward. This, I think, is the "right" answer--I'm just not doing it (yet).

Dan

Epsilon · Answer 10 · Fri Dec 22 2023 20:31:02 GMT+0800 (China Standard Time)

I was already sampling both rising and falling clock edges. I now tried 'oversampling' 8x and I indeed see the STB pulse stretching out further, not quite to 2 clock cycles, but to 1.5. A bit strange, but good enough as an indication that something's wrong there. Too bad my simulation now runs 4x more slowly...
Thanks for the great info!

/Ruben.