deviceplug / btleplug

Rust Cross-Platform Host-Side Bluetooth LE Access Library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

adapter.events() stops sending updated after while

zdenek-crha opened this issue · comments

Describe the bug

I run BLE scan in long running service which provides list of seen devices over REST API by watching service discovery (no connections to device).

  • I start scan
  • I'm getting CentralEvent events and can handle them
  • After while, all events stop coming

Expected behavior

Events continue coming until process is terminated.

Actual behavior

Events are coming for some time, then stop. Time before that happens ranges from half-hour to several hours. Once events stop coming, they will not resume (seen several days with no event) and process restart is needed.

Additional context

There are many BLE devices around (dozens to hunderds), they come online and disappear all the time. It is unlikely that lack of events would be caused by all BLE devices disappearing at the same time.

I have checked for errors noted in #150. It did not seem DBUS limits were exceed in time. As far as I can see, there is no suspicious activity logged in any system log.

I have observed same behavior when using bluer crate. Since bluer uses DBUS too, I suspect something on that level. Unfortunately I don't have know-how to track it deeper.

Versions

btleplug: v0.11.0
bluez: version: 5.66-1
os: GNU/Linux Debian (bookworm)

With BlueZ, scanning being enabled is a system-wide property. Is it possible some other process on the system is turning off Bluetooth scanning after some time? Does it make a difference if you have your program periodically enable scanning, rather than just once when it starts?

@qwandor The process runs as service on a server, where nothing else should be touching bluetooth. But I can't discount something is.

We have used btleplug 0.5.x until lately and there were no problem withthat. It was old of course, so I upgraded to 0.11 and the issue started to manifest. The 0.5 did not use DBUS yet, thats why I'm suspicious of it :-).

I have two tests planned for next few days:

  • revert to 0.5 and see if how it behaves
  • use your suggestion and re-enable scanning periodically

I will report how it goes

TLDR; I've run more tests and the result is not that promising:

  • discovery scan does not seem to be stopped
  • the v0.5.x version does not have the issue
  • the v0.5.x version does have better performance

Long version: I've monitored 4 versions/implementations, each for about single day. I have setup a check for number of BLE devices seen by service - if it reported 0 for 3 consecutive checks (span of 30s), the service was restarted automatically. I have run 2 instances at all times.

The graph below tracks CPU usage in ticks and each colored block in graph is a new process. You can see that version in first block using v0.11 was restarted fairly often. Version in second block using 0.5 was never restarted:

btleplug_restart_graph

Possibility that some outside process is turning the scan off does not seem likely to me anymore. First, the v0.5 works without repeated scan restart calls. Second, calling just start_scan() in v0.11 (3rd block) returns Operation already in progress even when there are no events coming in:

# process started here
10:06:41.92  INFO collect{dev="hci0 (usb:v1D6Bp0246d0542)"}: restart discovery scan

# last events received from DBUS
13:49:57.43 DEBUG collect{dev="hci0"}:update: name      => <DEVICE 1>
13:49:58.90 DEBUG collect{dev="hci0"}:update: name      => <DEVICE 2>

# no events are coming, we just run periodic cleanup of internal store to
# remove devices that did not report for long time. This went for a while
13:50:48.95 DEBUG cleanup: running state record cleanup (interval=60s)
... snip more lines ...
13:54:48.95 DEBUG cleanup: running state record cleanup (interval=60s)

# Periodit attempt to restart scanning process is not permitted. But from log
# above we know that no events are coming
13:54:48.95  WARN collect{dev="hci0"}: restart discovery scan: Operation already in progress

The workaround seems to be to periodically call stop_scan(); start_scan(); as done in 4th block. You can see CPU usage occasionally dips as events stop coming and stop being processed, then goes up again as scan is shutdown and started again.

Note on performance: A side result of running all these versions and monitoring them points to some performance regressions between v0.5 and v.11. The newer version takes 3-4 times more CPU for the same work.