blackmagic-debug / blackmagic

In application debugger for ARM Cortex microcontrollers.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inconsistent clock signal during SWDIO turnaround

stansotn opened this issue · comments

Summary

Serial wire debug (SWD/SWJ) clock signal generation is inconsistent during the turnaround cycle. The clock low time is to short during the turnaround cycle, which depending on the setup may result in the target's debug module missing the clock cycle entirely.

Version and platform information

Commit SHA ef32a54 the most recent tag is v1.10.1

gcc-arm-none-eabi-10.3-2021.07 toolchain.

STM32F411 "black pill" board, but I am guessing the issue is hardware agnostic.

10KOhm resistor pull up on the SWDIO line.

Steps to reproduce

The issue can be observed in all SWD transactions. Running monitor swd_scan with a valid debug target connected is enough to observe the issue on the oscilloscope trace. The issue becomes more apparent at higher frequencies since the troubled clock pulse becomes short enough to be missed by the target.

What is the current bug behavior?

Low side of the clock (SWCLK) signal does not adhere to the protocol timing during the turnaround cycle on the SWDIO pin. This causes a missing clock cycle at higher frequencies (see screenshot below).

What is the expected correct behavior?

The clock signal remains consistent throughout the entire serial wire debug transaction. By consistent I mean the clock pulse width (high and low) should be greater or equal 1/(2*f), were f is the configured SWD frequency.

Relevant logs, screenshots, or inputs

Following is a scope trace showing the missing clock low period during the first turnaround (BMP probe assuming receiver role). Clock frequency is measured at 36.4kHz. monitor frequency command says 79207Hz.

Screenshot 2023-12-27 at 6 08 05 PM

All consecutive SWD transactions contain the same issue during the turnaround cycle when the probe is assuming receiver role.

Possible causes or fixes

The low clock delay is only applied when BMP is turning around to be transmitter. The low clock delay is missing when BMP is becoming receiver.

https://github.com/blackmagic-debug/blackmagic/blob/main/src/platforms/common/swdptap.c#L72

Please give #1688 + #1690 a look and let us know if with them both applied together things are looking better or worse. Given the behaviour you're seeing, we half suspect this is actually platform-dependant and may already be fixed by one of those two PRs.

If this was really broken for, eg, native, it should have been noticed already we'd think - though, we aren't going to rule it out.

We will say though that.. your configured SWD frequency is not the thing we must not violate - it's the timings on the SWD interface of the target as specified by ADIv5, so as long as those timings are respected, a short pulse like this can be ok (if not ideal). Please also note that this platform uses the old method for defining the currently configured clock frequency for JTAG and SWD operations and requires a calibration done to present and use accurate timing information.

STM32F411 "black pill" board, but I am guessing the issue is hardware agnostic.

This particular issue sounds like a bug in swdptap.c:swdptap_turnaround() and would be hardware agnostic in that it should appear on all BMP-compatible platforms. You could perform a minimal cross-verification with your nice MSO/LA and the lowly stm32f103c8 minimum system development board aka "bluepill" by building and flashing BMD as PROBE_HOST=stlink BLUEPILL=1 or PROBE_HOST=swlink (that one has runtime detection), remember different platform pinouts.

Clock frequency is measured at 36.4kHz. monitor frequency command says 79207Hz.

That is another bug in timing_stm32.c:platform_max_frequency_get() which I aimed to fix in PR1688. The old calculation was missing a * 2U. The proper way forward would be to perform speed/delay measurements and calibration by the new methodology. May I ask why you asked for a frequency that low -- is this an ADC samplerate limitation of your instrument, or just a nice illustrative example?

The clock signal remains consistent throughout the entire serial wire debug transaction. By "consistent" I mean the clock pulse width (high and low) should be equal to [or greater than] 1/(2*f), where f is the configured SWD frequency.

... the timings on the SWD interface of the target as specified by ADIv5, so as long as those timings are respected, a short pulse like this can be ok (if not ideal).

So which one is it? Can BMF violate the "max frequency" configured volatile parameter? Then why does it exist? I like this finding, and it might well be the reason why my blackpill-f411ce sometimes locked up in testing 1688+1690 at the improved max no_delay frequency (probably uncaught "exception" from SWD invalid ACK).

If @stansotn you don't mind, I can incorporate moving the platform_delay_busy() out of if-else by appending another commit onto PR1688 -- it's a relatively small change conflicting in patch context with that PR, so if you opened your own PR and got it merged sooner, then I'd have to forward-rebase half the commits give-or-take. All I really have to observe and measure is an fx2la variant which gets fussy at high samplerates, but I need them to validate 8 MHz operation (so not 12, but 16; 24; 48 MHz in Pulseview). And sigrok-pico would only give me 7-bit 500kHz shared over 2-3 channels. You could help by testing.

So which one is it? Can BMF violate the "max frequency" configured volatile parameter? Then why does it exist? I like this finding, and it might well be the reason why my blackpill-f411ce sometimes locked up in testing 1688+1690 at the improved max no_delay frequency (probably uncaught "exception" from SWD invalid ACK).

We're pointing out that while it's not great for us to exceed the selected frequency, it's not an outright bug and error for target communications as long as we do not exceed the ADIv5 spec values for setup-and-hold and pulse speed - it's not good, but it's not the slam dunk that might otherwise be thought given it's bitbang'd. (The waveform shows an extreme uneven-ness in mark-space but not even double the frequency for the pulse because you have to measure from falling to falling or rising to rising edge because it's a clock cycle)

Essentially we're trying to point out that their criteria for what constitutes the "correct" behaviour being even mark-space ratio is arbitrary and not necessarily attainable as long as the overall cycle is approximately the right length and we don't exceed/violate the ADIv5 spec requirements on setup-and-hold, minimum high and low times, and that kind of thing - we can have a 5% duty cycle clock and be absolutely fine as long as these other conditions are met and we get approximately the desired clock frequency.

@ALTracer,

May I ask why you asked for a frequency that low -- is this an ADC samplerate limitation of your instrument, or just a nice illustrative example?

There is no strong reasoning behind probe frequency selection seen on the trace. I was troubleshooting a variable luck target connectivity issue, and I was scaling the probe frequency by factors of 10. On the trace the requested frequency is 40KHz.

You could perform a minimal cross-verification with your nice MSO/LA and the lowly stm32f103c8 minimum system development board aka "bluepill".

You could help by testing.

I don't have any blue pills laying around. I can get some bluepills in a matter of two weeks. I'd ne happy to do some testing.

That is another bug in timing_stm32.c:platform_max_frequency_get() which I aimed to fix in PR1688.

I can incorporate moving the platform_delay_busy() out of if-else by appending another commit onto PR1688 -- it's a relatively small change conflicting in patch context with that PR

I suggest separating fixes and improvements in separate PRs. Piece of general advice to improve readability of the contribution and release histories. Though I am not well versed in this project's workflow policies.

@dragonmux,

we don't exceed/violate the ADIv5 spec requirements on setup-and-hold, minimum high and low times, and that kind of thing - we can have a 5% duty cycle clock and be absolutely fine

I agree. I argue that the 5% clock duty cycle becomes a bottleneck with a combination of higher frequencies, longer wires, and the usage of level shifters.

@stansotn

I suggest separating fixes and improvements in separate PRs. Piece of general advice to improve readability of the contribution and release histories. Though I am not well versed in this project's workflow policies.

Current workflow policy of this project seems to be "clump less stuff into PRs and make atomic (enough) commits", which historically was not always the case. I myself don't like giant PRs with lots of weakly related changes thrown in. Still, my numerous PRs here are measurably smaller both in scope and in commit count as compared to some PRs of the current maintainer (not naming numbers 🙂).

The reason why I suggested this here specifically is a) it's a related change honestly, and there are some duty cycle rebalance changes already; b) it will have to go through merge conflict resolution otherwise, even if minor.

Now that I managed to sufficiently adjust and clean up both PR1690 (F4x1Cx SWDIO turnaround) and PR1700 (SWD parity builtins) to a mergeable state, which were related to / blocking PR1688, (thanks to Dragonmux for bearing through this!), I also made a sweep pass at PR1688 to declutter it from fixups and loop versions. If you argue that some of its commits are better cherry-picked away into a distinct PR related to this issue etc., then I'm open to hearing your thoughts on this. But it may be more churn w.r.t. runtime-testing each of them, readjusting (parts of) timings, fast-forwarding etc.

  • I'm not sure whether commit authorship can be retained over forks through cherry-picks and multiple rebases, but I could try to pick from your fork branch to retain "authored by" information.

  • If there are practical edge-cases where SWD communication fails due to SWD invalid ACKs or SWD parity errors at some concrete frequency against some concrete target chip, buffers, cable, PCB layout, then a bugreport with BMDA verbose logs is welcome so that issue priority is escalated. Especially with changes from 1688, because I remove some of old "guard rails" there; and I'm only a human dev & external contributor, so still prone to human error breaking stuff.

  • General recommendations to procuring a genuine stm32f103cb chip on bluepills remain. I'd recommend the same WeAct Studios BluePillPlus PCB, because I bought some and verified they ship what's on the label, if only the LED is moved from PC13 to PB2. Otherwise (no, CKS32 may not be compatible, APM32 is not verified, but GD32F1/F3 work) any old Discovery/Nucleo-64 board with an onboard stlink/v2-1 (128 KiB) will also do -- it may lack JTAG pinned out, but we're discussing SWD only. Bonus points for stlink/v2 standalone, which has an actual voltage translator IC inside for the "input" direction.

level shifters

Are you working in a usecase where default 3.3v LVCMOS levels of the adapter MCU are incompatible with the target? Is it lower (1.8v, 2.5v) or higher (5.0v) VDDIO on its SWJ-DP? Then all the better, because I'd like to have BMF working in these conditions properly and reliably without having users to downclock via monitor frequency 4M / platform_max_frequency_set() just because analog waveforms go wrong otherwise.

I agree. I argue that the 5% clock duty cycle becomes a bottleneck with a combination of higher frequencies, longer wires, and the usage of level shifters.

We don't think the code works that way at all right now, rather that there's a missing delay loop when the clock is high there so that duty cycle will be variable - if you double the probe operating frequency you should see the duty cycle double for that turnaround. That makes it non-limiting.

If you can confirm that we might know where the missing delay is. Fixing it will be more difficult without breaking things.

I'm not sure whether commit authorship can be retained over forks through cherry-picks and multiple rebases, but I could try to pick from your fork branch to retain "authored by" information.

Yes, this is the default behaviour of Git. You have to go out your way to mess that, and it's the committer information that changes depending on who's doing the rebate/cherry-pick. See the AUTHOR_NAME and AUTHOR_EMAIL env vars from the Git manual.

If you argue that some of its commits are better cherry-picked away into a distinct PR related to this issue etc., then I'm open to hearing your thoughts on this.

I am fine with either, just a general suggestion perhaps worth following in the future.

I'm not sure whether commit authorship can be retained over forks through cherry-picks and multiple rebases, but I could try to pick from your fork branch to retain "authored by" information.

I could push a signed commit on a branch in your fork (you'd have to grant access). I am flattered you'd like me on the contributor list:)

We don't think the code works that way at all right now, rather that there's a missing delay loop when the clock is high there so that duty cycle will be variable - if you double the probe operating frequency you should see the duty cycle double for that turnaround. That makes it non-limiting.

I can confirm that making the frequency faster effectively decreases the relative low timing during the turnaround cycle. It seems that the turnaround cycle low clock timing is an artifact of the unintentional code execution delay?

If you can confirm that we might know where the missing delay is. Fixing it will be more difficult without breaking things.

The missing delay is marked in the following diff

main...stansotn:blackmagic:fix/swdio-turnaround-clock

The resulting low clock pulse becomes whatever is now plus the expected busy loop delay (the low timing becomes "55%" of the turnaround cycle).

Oh, this bit of tricky code.. hmm.. we had a lot of problems with this already when we overhauled the bitbanging code and the delays were in the wrong places and gave ourself a headache figuring out where they really should be.. can you share a capture showing what you see with that loop moved down like that?

Are you working in a usecase where default 3.3v LVCMOS levels of the adapter MCU are incompatible with the target? Is it lower (1.8v, 2.5v) or higher (5.0v) VDDIO on its SWJ-DP? Then all the better, because I'd like to have BMF working in these conditions properly and reliably without having users to downclock via monitor frequency 4M / platform_max_frequency_set() just because analog waveforms go wrong otherwise.

Yes. I have a use case with and without a level shifter. I am also working on the use case where the target is isolated from the probe. The latter only works at low (10kHz) speeds due to the limitation in the isolator.

can you share a capture showing what you see with that loop moved down like that?

Will do later next week! I am away from my lab (happy new year). Happy to run other test, let's create a test wish list (perhaps in #1688)?

The missing delay is marked in the following diff main...stansotn:blackmagic:fix/swdio-turnaround-clock

I've been using the code tweak for some time, and I had scoped the clock behavior at some point.

Conceptual question. Is there a desire/a plan to use the dedicated peripheral (aka UART) on STM32 hosts for serial wire communication?

There have been plans/ideas floated to use either the USART or SPI peripherals for SWD and JTAG acceleration but it needs a month dedicated to it to see if it can be sensibly done without blowing firmware size out the window, and if it actually gets us any net improvement when also factoring in edge-cases like 3 byte transactions that almost certainly can't be serviced that way. It's a "The hardware ver 6 is capable of it in theory but it needs time put into testing the idea", so isn't something to worry about for now.

Ah yes, I forgot about the non 8bit aligned transaction width, which does rule out the use of generic SPI/UART peripherals. SWD and JTAG protocol peculiarities likely make RP2040 a great (inexpensive) candidate for being the probe host with its "make your own protocol" PIO peripherals.

I was trying to track down a failure with the native BMP remote protocol when using the hosted BMDA. It showed up as the BMP hanging while BMDA was trying to speak an "advanced" protocol to offload the low-level SWD transaction stuff. When using the earlier protocol, BMDA took care of individual SWD read and write bit strings, but (recoverable) errors showed up.

This revealed a lack of the required 8 idle clock cycles when ending a read transaction and before stopping the clock. Adding those made some of the failing transactions work.

I've been meaning to get back to debugging that. In the process, I started to examine the code in swdptap.c, and found the same problem with turnarounds that @stansotn found. I also concluded that there was a missing delay loop in the write-to-read turnaround scenario. I do vaguely recall seeing some clock oddities in that location the last time I put a logic analyzer on the SWD lines in this scenario.

Also, my analysis is that a read transaction would do the turnaround, but the clock could remain high, due to a missing gpio_clear at the end of swdptap_turnaround. This might also contribute to the failures that I was seeing.

Adding 8 idle cycles happened to also provide the missing falling clock edge, so it's not clear which one was causing the problems.

We don't mind if it's here or on Discord, but it would be really helpful if you could write up your findings so-far as we'd like to reproduce what you're seeing and better understand what's going on in the failing turnarounds and idle cycles.

This sounds like a major-enough issue that we might need to release backports of w/e fix happens to at least the v1.9 and v1.10 firmware series, plus potentially even v1.8.

@stansotn I created #1711 to fix multiple SWD turnaround issues, including the one you found here. Please give it a try to see if it improves your issue.

Please give it a try to see if it improves your issue.

I'd be happy to run tests with an o-scope towards the end of the week, but seems like you are on top of the issue already. Feel free to ping me if any additional testing is needed.

I did a quick check on the logic analyzer, and the asymmetry in the turnaround clock pulse widths pre-patch was very visible, especially below 10kHz. Nice and symmetrical after the patch.