morrownr / 8812au-20210629

Linux Driver for USB WiFi Adapters that are based on the RTL8812AU Chipset - v5.13.6

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kernel crash

elouet opened this issue · comments

Running the driver nicely over USB and in AP mode, on a Debian 12 with kernel 6.1.0-18-amd64, I'm getting random kernel panics every couple of hours.

Apr 17 18:23:29 diomede kernel: ------------[ cut here ]------------
Apr 17 18:23:29 diomede kernel: WARNING: CPU: 2 PID: 1714 at /root/8812au-20210820/core/rtw_mlme.c:3460 rtw_sta_mstatus_disc_rpt+0xb9/0xec [8812au]
Apr 17 18:23:29 diomede kernel: Modules linked in: 8812au(OE) nft_chain_nat xt_MASQUERADE nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp xt_mac nft_compat nf_tables libcrc32c nfnetlink cfg80211 bridge stp llc qrtr binfmt_misc nls_ascii nls_cp437 vfat fat intel_rapl_msr intel_rapl_common snd_hda_codec_hdmi intel_powerclamp coretemp snd_hda_codec_realtek hci_uart snd_hda_codec_generic ledtrig_audio kvm_intel btqca btrtl btbcm snd_hda_intel btintel mei_hdcp kvm bluetooth irqbypass snd_intel_dspcfg ghash_clmulni_intel sha256_ssse3 sha1_ssse3 snd_intel_sdw_acpi jitterentropy_rng snd_hda_codec sha512_ssse3 snd_hda_core snd_hwdep sha512_generic snd_pcm iTCO_wdt intel_pmc_bxt iTCO_vendor_support ctr snd_timer aesni_intel pcspkr crypto_simd snd mei_txe drbg cryptd watchdog soundcore at24 ansi_cprng intel_cstate mei intel_xhci_usb_role_switch roles ecdh_generic rfkill ecc pwm_lpss_platform intel_int0002_vgpio pwm_lpss evdev sg parport_pc ppdev lp parport loop fuse dm_mod efi_pstore configfs
Apr 17 18:23:29 diomede kernel:  efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic uas usb_storage i915 sd_mod drm_buddy i2c_algo_bit t10_pi drm_display_helper cec crc64_rocksoft crc64 crc_t10dif xhci_pci crct10dif_generic rc_core xhci_hcd ttm crct10dif_pclmul ahci drm_kms_helper r8169 crct10dif_common libahci crc32_pclmul realtek libata crc32c_intel mdio_devres usbcore drm i2c_i801 libphy i2c_smbus lpc_ich scsi_mod scsi_common usb_common fan video i2c_hid_acpi i2c_hid wmi hid button
Apr 17 18:23:29 diomede kernel: CPU: 2 PID: 1714 Comm: RTW_CMD_THREAD Tainted: G           OE      6.1.0-18-amd64 #1  Debian 6.1.76-1
Apr 17 18:23:29 diomede kernel: Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.11 08/22/2015
Apr 17 18:23:29 diomede kernel: RIP: 0010:rtw_sta_mstatus_disc_rpt+0xb9/0xec [8812au]
Apr 17 18:23:29 diomede kernel: Code: 59 c1 48 c7 c2 4d 18 5b c1 48 89 d6 48 c7 c7 28 b9 5b c1 e8 30 a6 73 db 48 83 c4 08 e9 76 ff ff ff 83 3d 41 c5 1b 00 00 75 09 <0f> 0b 5b 5d c3 cc cc cc cc 44 0f b6 ce 4c 8b 87 a8 44 00 00 48 c7
Apr 17 18:23:29 diomede kernel: RSP: 0018:ffffb6638294be58 EFLAGS: 00010246
Apr 17 18:23:29 diomede kernel: RAX: ffff9a6f4a0be000 RBX: 00000000000000ff RCX: 0000000000000000
Apr 17 18:23:29 diomede kernel: RDX: 0000000000000010 RSI: 00000000000000ff RDI: ffffb66380327000
Apr 17 18:23:29 diomede kernel: RBP: ffffb66380327000 R08: 000000000000000b R09: 00000000802a0026
Apr 17 18:23:29 diomede kernel: R10: 0000000000000002 R11: 0000000000000001 R12: ffff9a6f4a12ea04
Apr 17 18:23:29 diomede kernel: R13: ffffb66380328170 R14: ffff9a6f62143080 R15: ffffb66380328140
Apr 17 18:23:29 diomede kernel: FS:  0000000000000000(0000) GS:ffff9a70b7d00000(0000) knlGS:0000000000000000
Apr 17 18:23:29 diomede kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 17 18:23:29 diomede kernel: CR2: 00007f820dbf8d10 CR3: 000000012de10000 CR4: 00000000001006e0
Apr 17 18:23:29 diomede kernel: Call Trace:
Apr 17 18:23:29 diomede kernel:  <TASK>
Apr 17 18:23:29 diomede kernel:  ? __warn+0x7d/0xc0
Apr 17 18:23:29 diomede kernel:  ? rtw_sta_mstatus_disc_rpt+0xb9/0xec [8812au]
Apr 17 18:23:29 diomede kernel:  ? report_bug+0xe2/0x150
Apr 17 18:23:29 diomede kernel:  ? handle_bug+0x41/0x70
Apr 17 18:23:29 diomede kernel:  ? exc_invalid_op+0x13/0x60
Apr 17 18:23:29 diomede kernel:  ? asm_exc_invalid_op+0x16/0x20
Apr 17 18:23:29 diomede kernel:  ? rtw_sta_mstatus_disc_rpt+0xb9/0xec [8812au]
Apr 17 18:23:29 diomede kernel:  rtw_stadel_event_callback+0x3d/0x33b [8812au]
Apr 17 18:23:29 diomede kernel:  mlme_evt_hdl+0x62/0x81 [8812au]
Apr 17 18:23:29 diomede kernel:  rtw_cmd_thread+0x479/0x644 [8812au]
Apr 17 18:23:29 diomede kernel:  ? set_tx_beacon_cmd+0x17c/0x17c [8812au]
Apr 17 18:23:29 diomede kernel:  ? rtw_stop_cmd_thread+0x41/0x41 [8812au]
Apr 17 18:23:29 diomede kernel:  kthread+0xda/0x100
Apr 17 18:23:29 diomede kernel:  ? kthread_complete_and_exit+0x20/0x20
Apr 17 18:23:29 diomede kernel:  ret_from_fork+0x22/0x30
Apr 17 18:23:29 diomede kernel:  </TASK>
Apr 17 18:23:29 diomede kernel: ---[ end trace 0000000000000000 ]---
commented

Hi @elouet

Run remove-driver.sh to uninstall this driver and then try the new version:

https://github.com/morrownr/8812au-20210820

I'm planning to retire this driver soon I would prefer not to start working bugs on it.

Thanks for your feedback. Actually the bad news is that I posted this issue in the wrong github project. I'm already running the version you indicate.

I have moved to USB3 and set the configuration accordingly, I'll see if the problem persists.

It's only a warning, not a panic. Use the module parameter rtw_drv_log_level=1 to see the reason for the warning.

My guess is you sometimes reach the maximum number of supported clients (32?) and the driver has a classic off by one mistake somewhere, allowing you to connect one more than it should.

commented

@dubhater

My guess is you sometimes reach the maximum number of supported clients (32?)...

I actually added a patch to up the number of supported clients in AP mode to 64 in the new driver, which is where this issue should be posted. Could it be that there is a problem with what I did? Yes. I have no way to test even 32 clients so what I did could be causing some problem to show up. I tried to up the number of clients because this driver is really good at AP mode and these days, a lot of folks building their own APs have a lot of IoT clients and need greater capacity.

What we really need is the OP to repost this issue in the correct repo. We could then test some things. Realtek does not do extensive testing on these drivers so there are little land mines all over the place ready to explode.

All, thanks for the feedback. I really doubt that the problem is caused by having 32+ stations on the AP. I have turned on more logs, but since then the problem seems to have gone, which is a bit embarrassing. I'll close this issue and make sure to reopen in the right project if I have more information. Again thanks a lot for your support.