linux-surface / surface-pro-x

Tracking and meta repository for Surface Pro X support.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Latest kernel causes issues with the ath10k driver (SQ1/8/512 - potential regression?)

Deinsti opened this issue · comments

Greetings!

I have been testing Arch Linux on my SQ1 Surface Pro X for a while now and generally experimented with a variety of areas such as GPU, CPU behaviour, etc.
However recently, the latest kernel (seemingly) has a regression which causes the kernel to constantly panic after connecting to a Wi-Fi network (not instantly, this usually takes a random amount of time, ranging from a single minute up to 20 minutes)

SError Interrupt on CPU0, code 0x00000000be000011
...
pc : ath10k_snoc_napi_poll+0x84/0x150 [ath10k_snoc]

Suggests that the WiFi driver in the kernel is acting up, but I wonder why?

As far as I know, this issue was not present in the previous kernel versions... Any ideas as to the cause? I can add extra details if you wish, just ask!

[System Specs] - Surface Pro X
CPU: SQ1
RAM: 8GB
SSD: 512GB
Kernel Ver: linux-surface 6.0.3-1
Boot method: USB with DTB

IMG_20221024_220232704

Thanks for reporting this. I had some spurious issues with wifi already in 6.0.1. In particular it freezed the desktop, but I assume they're the same problem as I was able to confirm your log by dropping to a tty in 6.0.3 (which also has these issues). Unfortunately I haven't had the time yet to dig into this more.

Okay, so unfortunately I can't get ./scripts/faddr2line to print the line where it fails:

$ ./scripts/faddr2line drivers/net/wireless/ath/ath10k/ath10k_snoc.ko ath10k_snoc_napi_poll+0x84
ath10k_snoc_napi_poll+0x84/0x150:
ath10k_snoc_napi_poll at snoc.c:?

Looks like we'll have to debug it the old-fashioned way.

I'm having some difficulties debugging this... It seems that since it's asynchronous we can't rely on the program counter or stack trace to point us to the specific place where it's failing. I do get some variation with the PC, but ath10k_snoc_napi_poll is always in the backtrace, so I'm assuming it's got something to do with that.

@Deinsti Can you still reproduce this on the latest kernel?

@qzed I currently do not have an Arch install ready to test, but either tomorrow or the weekend I'll re-install it and give it a go