adapter.events() stops sending updated after while

Question

adapter.events() stops sending updated after while

zdenek-crha opened this issue 9 months ago · comments

Describe the bug

I run BLE scan in long running service which provides list of seen devices over REST API by watching service discovery (no connections to device).

I start scan
I'm getting CentralEvent events and can handle them
After while, all events stop coming

Expected behavior

Events continue coming until process is terminated.

Actual behavior

Events are coming for some time, then stop. Time before that happens ranges from half-hour to several hours. Once events stop coming, they will not resume (seen several days with no event) and process restart is needed.

Additional context

There are many BLE devices around (dozens to hunderds), they come online and disappear all the time. It is unlikely that lack of events would be caused by all BLE devices disappearing at the same time.

I have checked for errors noted in #150. It did not seem DBUS limits were exceed in time. As far as I can see, there is no suspicious activity logged in any system log.

I have observed same behavior when using bluer crate. Since bluer uses DBUS too, I suspect something on that level. Unfortunately I don't have know-how to track it deeper.

Versions

btleplug: v0.11.0
bluez: version: 5.66-1
os: GNU/Linux Debian (bookworm)

Andrew Walbran · Answer 1 · Tue Aug 08 2023 20:42:35 GMT+0800 (China Standard Time)

With BlueZ, scanning being enabled is a system-wide property. Is it possible some other process on the system is turning off Bluetooth scanning after some time? Does it make a difference if you have your program periodically enable scanning, rather than just once when it starts?

Zdenek Crha · Answer 2 · Tue Aug 08 2023 23:43:48 GMT+0800 (China Standard Time)

@qwandor The process runs as service on a server, where nothing else should be touching bluetooth. But I can't discount something is.

We have used btleplug 0.5.x until lately and there were no problem withthat. It was old of course, so I upgraded to 0.11 and the issue started to manifest. The 0.5 did not use DBUS yet, thats why I'm suspicious of it :-).

I have two tests planned for next few days:

revert to 0.5 and see if how it behaves
use your suggestion and re-enable scanning periodically

I will report how it goes

Zdenek Crha · Answer 3 · Mon Aug 14 2023 17:05:06 GMT+0800 (China Standard Time)

TLDR; I've run more tests and the result is not that promising:

discovery scan does not seem to be stopped
the v0.5.x version does not have the issue
the v0.5.x version does have better performance

Long version: I've monitored 4 versions/implementations, each for about single day. I have setup a check for number of BLE devices seen by service - if it reported 0 for 3 consecutive checks (span of 30s), the service was restarted automatically. I have run 2 instances at all times.

The graph below tracks CPU usage in ticks and each colored block in graph is a new process. You can see that version in first block using v0.11 was restarted fairly often. Version in second block using 0.5 was never restarted:

Possibility that some outside process is turning the scan off does not seem likely to me anymore. First, the v0.5 works without repeated scan restart calls. Second, calling just start_scan() in v0.11 (3rd block) returns Operation already in progress even when there are no events coming in:

# process started here
10:06:41.92  INFO collect{dev="hci0 (usb:v1D6Bp0246d0542)"}: restart discovery scan

# last events received from DBUS
13:49:57.43 DEBUG collect{dev="hci0"}:update: name      => <DEVICE 1>
13:49:58.90 DEBUG collect{dev="hci0"}:update: name      => <DEVICE 2>

# no events are coming, we just run periodic cleanup of internal store to
# remove devices that did not report for long time. This went for a while
13:50:48.95 DEBUG cleanup: running state record cleanup (interval=60s)
... snip more lines ...
13:54:48.95 DEBUG cleanup: running state record cleanup (interval=60s)

# Periodit attempt to restart scanning process is not permitted. But from log
# above we know that no events are coming
13:54:48.95  WARN collect{dev="hci0"}: restart discovery scan: Operation already in progress

The workaround seems to be to periodically call stop_scan(); start_scan(); as done in 4th block. You can see CPU usage occasionally dips as events stop coming and stop being processed, then goes up again as scan is shutdown and started again.

Note on performance: A side result of running all these versions and monitoring them points to some performance regressions between v0.5 and v.11. The newer version takes 3-4 times more CPU for the same work.