RHSResearchLLC / NiteFury-and-LiteFury

Public repository for Litefury & Nitefury

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Failed to detect XDMA config BAR

spaceotter opened this issue · comments

I'm having some trouble with my NiteFury card and top-of-master XDMA. It works when plugged into a PCIe switch card: https://www.amazon.com/gp/product/B08L8J3MBT/ with some occasional reliability issues. But when plugged into the mother board with a M.2 adapter, the driver doesn't work.

lspci -d 10ee: -vvv
4b:00.0 Serial controller: Xilinx Corporation Device 7024 (prog-if 01 [16450])
	Subsystem: Xilinx Corporation Device 0007
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 193
	Region 0: Memory at d0300000 (32-bit, non-prefetchable) [virtual] [size=1M]
	Region 1: Memory at d0400000 (32-bit, non-prefetchable) [virtual] [size=64K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [60] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 5GT/s, Width x4, ASPM L0s, Exit Latency L0s unlimited
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s (ok), Width x4 (ok)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range B, TimeoutDis-, NROPrPrP-, LTR-
			 10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS-, TPHComp-, ExtTPHComp-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
			 AtomicOpsCtl: ReqEn-
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Device Serial Number 00-00-00-00-00-00-00-00
Result of removing the card in /sys and rescanning
# echo "1" > /sys/bus/pci/devices/0000\:4b\:00.0/remove
# echo "1" > /sys/bus/pci/rescan 
[Feb15 12:40] pci 0000:4b:00.0: Removing from iommu group 62
[Feb15 12:41] pcieport 0000:04:08.0: bridge window [mem 0x00100000-0x000fffff] to [bus 05] add_size 200000 add_align 100000
[  +0.000004] pcieport 0000:03:00.0: bridge window [mem 0x00100000-0x001fffff] to [bus 04-05] add_size 200000 add_align 100000
[  +0.000012] pcieport 0000:03:00.0: BAR 14: no space for [mem size 0x00300000]
[  +0.000001] pcieport 0000:03:00.0: BAR 14: failed to assign [mem size 0x00300000]
[  +0.000002] pcieport 0000:03:00.0: BAR 14: no space for [mem size 0x00100000]
[  +0.000001] pcieport 0000:03:00.0: BAR 14: failed to assign [mem size 0x00100000]
[  +0.000001] pcieport 0000:04:08.0: BAR 14: no space for [mem size 0x00200000]
[  +0.000001] pcieport 0000:04:08.0: BAR 14: failed to assign [mem size 0x00200000]
[  +0.000002] pcieport 0000:04:08.0: BAR 14: no space for [mem size 0x00200000]
[  +0.000001] pcieport 0000:04:08.0: BAR 14: failed to assign [mem size 0x00200000]
[  +0.018879] pci 0000:4b:00.0: [10ee:7024] type 00 class 0x070001
[  +0.000023] pci 0000:4b:00.0: reg 0x10: [mem 0xd0300000-0xd03fffff]
[  +0.000010] pci 0000:4b:00.0: reg 0x14: [mem 0xd0400000-0xd040ffff]
[  +0.000105] pci 0000:4b:00.0: PME# supported from D0 D1 D2 D3hot
[  +0.000627] pci 0000:4b:00.0: Adding to iommu group 62
[  +0.000115] pci 0000:4b:00.0: BAR 0: assigned [mem 0xd0300000-0xd03fffff]
[  +0.000004] pci 0000:4b:00.0: BAR 1: assigned [mem 0xd0400000-0xd040ffff]
[Feb15 12:44] xdma:xdma_mod_init: Xilinx XDMA Reference Driver xdma v2020.1.8
[  +0.000002] xdma:xdma_mod_init: desc_blen_max: 0xfffffff/268435455, timeout: h2c 10 c2h 10 sec.
[  +0.000071] xdma:xdma_device_open: xdma device 0000:4b:00.0, 0x0000000045520a28.
[  +0.000001] xdma:alloc_dev_instance: xdev = 0x0000000030361898
[  +0.000003] xdma:xdev_list_add: dev 0000:4b:00.0, xdev 0x0000000030361898, xdma idx 0.
[  +0.000130] xdma:request_regions: pci_request_regions()
[  +0.000005] xdma:map_single_bar: BAR0: 1048576 bytes to be mapped.
[  +0.000025] xdma:map_single_bar: BAR0 at 0xd0300000 mapped at 0x00000000519eaa09, length=1048576(/1048576)
[  +0.000004] xdma:is_config_bar: BAR 0 is NOT the XDMA config BAR: 0xffffffff, 0xffffffff.
[  +0.000001] xdma:map_single_bar: BAR1: 65536 bytes to be mapped.
[  +0.000010] xdma:map_single_bar: BAR1 at 0xd0400000 mapped at 0x0000000071b088e5, length=65536(/65536)
[  +0.000002] xdma:is_config_bar: BAR 1 is NOT the XDMA config BAR: 0xffffffff, 0xffffffff.
[  +0.000001] xdma:map_bars: Failed to detect XDMA config BAR
[  +0.000034] pcieport 0000:40:01.3: DPC: containment event, status:0x1f01 source:0x0000
[  +0.000002] pcieport 0000:40:01.3: DPC: unmasked uncorrectable error detected
[  +0.025532] xdma:probe_one: pdev 0x0000000045520a28, err -22.
[  +0.000003] xdma:xpdev_free: xpdev 0x00000000b9ed515b, destroy_interfaces, xdev 0x0000000000000000.
[  +0.000001] xdma:xpdev_free: xpdev 0x00000000b9ed515b, xdev 0x0000000000000000 xdma_device_close.
[  +0.000001] xdma:xdma_device_close: pdev 0x0000000045520a28, xdev 0x0000000000000000.
[  +0.000006] xdma: probe of 0000:4b:00.0 failed with error -22
[  +0.135983] pcieport 0000:40:01.3: AER: Device recovery failed

I thought, maybe the debugger is holding it in reset. So I built the sample project in 2018.3, which went as expected, then I remove the pcie device, flash the FPGA and add it back. The result is exactly the same.

My previous post was only about 2018.3, but I tested 2020.2 as well.
With the pcie switch card:
2018.3 worked.
2020.2 didn't
Without the switch card, using an adapter (not sure what part number) like this one https://www.amazon.com/gp/product/B00MYCQP38
Neither version of vivado works.
Pretty sure the failure mode is always as above.
I could try testing the adapters with an SSD later.

The motherboard is an MSI Creator TRX40. I might have time on Wednesday to test combinations of adapter and maybe another computer. Does the trace above give any hint what might be going wrong?

Yes it looks like the config space is returning all 0xFFFF. I've seen this happen when the link drops. Even if the link recovers, it seems the config space doesn't.

It would be interesting to try the adapter with SSD. I've never had a problem linking up with adapter or without, when built with 2018 or 2019. I haven't had a chance yet to figure out what the deal is when using 2020. I've tried about 8 different motherboards on a LiteFury built with 2020, and all of them worked fine except for one.

Is the config space in the programmable logic? Why doesn't it get reset by the JTAG?