e3prom / rVRRPd

A lightweight, fast, and highly secure VRRP daemon.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

100% CPU usage on FreeBSD

e3prom opened this issue · comments

rVRRPd is having continuous 100% CPU usage on FreeBSD:

# uname -a
FreeBSD lab-fbsd-01 12.0-RELEASE FreeBSD 12.0-RELEASE r341666 GENERIC  amd64

truss reports time-consuming read() syscalls:

# truss -c -H -p 67459
syscall                     seconds   calls  errors
sched_yield             0.001360020      24       0
write                   0.004256053     112       0
read                    2.994538834      26      10
compat11.kevent         6.007358899      33       0
_umtx_op                2.993348184      42       0
                      ------------- ------- -------
                       12.000861990     237      10

along with returned errors:
101074: read(10,0x7fffdf7f7b40,128) ERR#35 'Resource temporarily unavailable'

By tracing the read() system call with the file descriptor 10, we can determine the calls comes from the rust tokio library.

gdb backtrace with break read if $rdi == 10:

Thread 6 "tokio-runtime-worke" hit Breakpoint 1, 0x000000080159eaa4 in read () from /lib/libc.so.7
(gdb) bt
#0  0x000000080159eaa4 in read () from /lib/libc.so.7
#1  0x000000000145dfcb in <&std::fs::File as std::io::Read>::read ()
#2  0x00000000013ec6d8 in <&mio::sys::unix::io::Io as std::io::Read>::read (self=0x7fffdf7f7bd8, dst=...)
    at /home/eprom/.cargo/registry/src/github.com-1ecc6299db9ec823/mio-0.6.19/src/sys/unix/io.rs:85
#3  0x00000000013dfdc0 in mio::sys::unix::awakener::pipe::Awakener::cleanup (self=0x802432238)
    at /home/eprom/.cargo/registry/src/github.com-1ecc6299db9ec823/mio-0.6.19/src/sys/unix/awakener.rs:49
#4  0x00000000013d4a44 in mio::poll::Poll::poll2 (self=0x80245c0b0, events=0x802458268, timeout=..., interruptible=false)
    at /home/eprom/.cargo/registry/src/github.com-1ecc6299db9ec823/mio-0.6.19/src/poll.rs:1182
#5  0x00000000013d4707 in mio::poll::Poll::poll1 (self=0x80245c0b0, events=0x802458268, timeout=..., interruptible=false)
    at /home/eprom/.cargo/registry/src/github.com-1ecc6299db9ec823/mio-0.6.19/src/poll.rs:1139
#6  0x00000000013d4032 in mio::poll::Poll::poll (self=0x80245c0b0, events=0x802458268, timeout=...)
    at /home/eprom/.cargo/registry/src/github.com-1ecc6299db9ec823/mio-0.6.19/src/poll.rs:1010
#7  0x0000000001341407 in tokio_reactor::Reactor::poll (self=0x802458268, max_wait=...)
    at /home/eprom/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-reactor-0.1.10/src/lib.rs:360
#8  0x0000000001341254 in tokio_reactor::Reactor::turn (self=0x802458268, max_wait=...)
    at /home/eprom/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-reactor-0.1.10/src/lib.rs:335
#9  0x0000000001342524 in <tokio_reactor::Reactor as tokio_executor::park::Park>::park_timeout (self=0x802458268, duration=...)
    at /home/eprom/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-reactor-0.1.10/src/lib.rs:464
#10 0x00000000013287a9 in <tokio_timer::timer::Timer<T,N> as tokio_executor::park::Park>::park (self=0x802458240)
    at /home/eprom/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.11/src/timer/mod.rs:369
#11 0x000000000132bb0d in <tokio_threadpool::park::boxed::BoxedPark<T> as tokio_executor::park::Park>::park (self=0x802458240)
    at /home/eprom/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-threadpool-0.1.16/src/park/boxed.rs:29
#12 0x000000000138d6bf in tokio_threadpool::worker::entry::WorkerEntry::park (self=0x80249a300)
    at /home/eprom/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-threadpool-0.1.16/src/worker/entry.rs:220
#13 0x000000000139bef8 in tokio_threadpool::worker::Worker::sleep (self=0x7fffdf7f9a88)

[EDITED FOR BREVITY]

    at /usr/ports/lang/rust/work/rustc-1.38.0-src/src/libstd/thread/local.rs:239
#40 0x0000000001399855 in tokio_threadpool::worker::Worker::do_run (self=0x7fffdf7f9a88)
    at /home/eprom/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-threadpool-0.1.16/src/worker/mod.rs:116
#41 0x000000000139f4a1 in tokio_threadpool::pool::Pool::spawn_thread::{{closure}} ()
    at /home/eprom/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-threadpool-0.1.16/src/pool/mod.rs:345
#42 0x000000000136b750 in std::sys_common::backtrace::__rust_begin_short_backtrace (f=...)
    at /usr/ports/lang/rust/work/rustc-1.38.0-src/src/libstd/sys_common/backtrace.rs:77
#43 0x000000000138f3b1 in std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}} ()
    at /usr/ports/lang/rust/work/rustc-1.38.0-src/src/libstd/thread/mod.rs:470
#44 0x000000000137eaa1 in <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (self=..., _args=())
    at /usr/ports/lang/rust/work/rustc-1.38.0-src/src/libstd/panic.rs:315
#45 0x0000000001375b77 in std::panicking::try::do_call (data=0x7fffdf7f9d28 "\000")
    at /usr/ports/lang/rust/work/rustc-1.38.0-src/src/libstd/panicking.rs:296
#46 0x000000000146d3df in __rust_maybe_catch_panic ()
#47 0x0000000001375849 in std::panicking::try (f=...) at /usr/ports/lang/rust/work/rustc-1.38.0-src/src/libstd/panicking.rs:275
#48 0x000000000137f781 in std::panic::catch_unwind (f=...) at /usr/ports/lang/rust/work/rustc-1.38.0-src/src/libstd/panic.rs:394
#49 0x000000000138f1e2 in std::thread::Builder::spawn_unchecked::{{closure}} ()
    at /usr/ports/lang/rust/work/rustc-1.38.0-src/src/libstd/thread/mod.rs:469
#50 0x0000000001394b56 in core::ops::function::FnOnce::call_once{{vtable-shim}} ()
    at /usr/ports/lang/rust/work/rustc-1.38.0-src/src/libcore/ops/function.rs:235
#51 0x000000000146a90f in <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once ()
#52 0x000000000145ff31 in std::sys_common::thread::start_thread ()
#53 0x000000000146ca69 in std::sys::unix::thread::Thread::new::thread_start ()
#54 0x00000008014e0776 in ?? () from /lib/libthr.so.3
#55 0x0000000000000000 in ?? ()

After further investigation, it looks like the panic related to tokio is in fact not responsible for the 100% CPU usage. The loop in src/lib.rs:887 seems to cause this issue.

A call to sleep() work around the problem, and the daemon can still promptly answer to signals.

Issue fixed in 7a7e9c0