openwrt / docker

Docker containers of the ImageBuilder and SDK

Home Page:https://gitlab.com/openwrt/docker

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

unstable: high cpu usage by /sbin/urngd

lePereT opened this issue · comments

Hi all, getting a lot of instability. On MacOS Mojave, running Docker version 19.03.8, and docker-machine version 0.16.2

If I just use the Readme command:

docker run --rm -it openwrtorg/rootfs

I get a number of error messages during launch:

rich$ docker run --rm -it openwrtorg/rootfs
Failed to resize receive buffer: Operation not permitted
ip: RTNETLINK answers: Operation not permitted
Press the [f] key and hit [enter] to enter failsafe mode
Press the [1], [2], [3] or [4] key and hit [enter] to select the debug level
ip: can't send flush request: Operation not permitted
ip: SIOCSIFFLAGS: Operation not permitted
Please press Enter to activate this console.

When in the shell, it's sluggish, and I notice that one core of my CPU is being used at 100%. A top inside the container reveals the following:

Mem: 433964K used, 579256K free, 290552K shrd, 9536K buff, 323160K cached
CPU:  99% usr   0% sys   0% nic   0% idle   0% io   0% irq   0% sirq
Load average: 0.99 0.58 0.24 2/163 817
  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
   92     1 root     R      780   0% 100% /sbin/urngd
  279     1 root     S     1300   0%   0% /sbin/rpcd -s /var/run/ubus.sock -t 30
  434     1 root     S     1196   0%   0% /sbin/netifd
    1     0 root     S     1116   0%   0% /sbin/procd
   76     1 root     S     1084   0%   0% /bin/ash --login

Am I doing something wrong?

Thanks for the report, I've never touched urngd but maybe @ynezz has a clue...

So, quickly typing a killall /sbin/urngd after terminal access is gained appears to make urngd behave. Not ideal. Also what are the following error messages all about:

Failed to resize receive buffer: Operation not permitted
ip: RTNETLINK answers: Operation not permitted
...
ip: can't send flush request: Operation not permitted
ip: SIOCSIFFLAGS: Operation not permitted

Just to confirm that the problem persists with an Ubuntu 18.04 VM as host

Mem: 865520K used, 143284K free, 984K shrd, 34440K buff, 579772K cached
CPU:  99% usr   0% sys   0% nic   0% idle   0% io   0% irq   0% sirq
Load average: 0.39 0.11 0.04 4/154 711
  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
   91     1 root     R      776   0%  99% /sbin/urngd
  444     1 root     S     1208   0%   0% /sbin/netifd
    1     0 root     S     1176   0%   0% /sbin/procd

I can't reproduce the error, did you tried to reproduce it on other machines?

I can't reproduce that error even on Ubuntu 18.04 (but with 5.6.7 kernel). It would help to get strace output from urngd if its in this state, should be as easy as running opkg update; opkg install strace; strace --no-abbrev --attach $(pidof urngd) inside container spawn with docker run --cap-add SYS_PTRACE --rm -it openwrtorg/rootfs

i'll attempt to do this in the next week or so. i'll close the issue for now to prevent noise :) thanks for both your responses

I would like to reopen this issue.

I am running in the same bug when OpenWRT is running in a docker that does not allow ioctl RNDADDENTROPY on /dev/random.

This causes an infinite loop consuming high cpu because the WRITE poll event keeps triggering and is never satisfied (because it cannot), thus causing the infinite busy loop.

Should I provide a possible fix? I would simply stop the polling for a certain amount of time in case RNDADDENTROPY fails.

I have the same issue in Ubuntu18.04 VM, and OpenWRT(19.07.02) in the docker container.

@thg2k please provide a fix

@aparcar I did, but it was refused by the maintainer.

http://lists.openwrt.org/pipermail/openwrt-devel/2021-January/033587.html

It is indeed a very bad workaround but it solves the problem without causing any regression damage and it's easy to audit. A better fix would be to use uloop timers and improve logging but I have no interest in spending more time on this. It is still a fix and I recommend merging it.

I got this problem on my MT7621 router too, maybe there is something wrong with the source code.

I ran into this same problem when using PVE to run OpenWrt in Linux Container, according to random(4) - Linux manual page, The CAP_SYS_ADMIN capability is required for almost all related ioctl requests.

I had included the default OpenWrt config file (same as this lxc-template) which contains lxc.cap.drop = sys_admin, I removed this line and the /sbin/urngd not stuck my CPU anymore.

I think there is also a way to grant the SYS_ADMIN capability to a Docker container, but it is overloaded so the decision is yours.

Moreover, it seems just uninstall the urngd package could also solve this problem but I'm not sure the side effect.

I ran into this problem today on a Linksys WRT1900ACS which has an uptime of 248 days running

~# cat /etc/openwrt_release 
DISTRIB_ID='OpenWrt'
DISTRIB_RELEASE='21.02.0'
DISTRIB_REVISION='r16279-5cc0535800'
DISTRIB_TARGET='mvebu/cortexa9'
DISTRIB_ARCH='arm_cortex-a9_vfpv3-d16'
DISTRIB_DESCRIPTION='OpenWrt 21.02.0 r16279-5cc0535800'
DISTRIB_TAINTS=''

Suddenly at around 1am my load jumped.
Screenshot_2022-06-27_11-45-53

Killing urngd helped. But restarting it brought the load back up again. So, now I've killed urngd without restarting it. I will keep the system up to see if there are any impacts of having urngd stopped.

What, by the way, could be using urngd? Maybe those processes just need a restart. Perhaps dnsmasq? Anything else? Does OLSRd or babeld use urngd?

It looks like I am also seeing this on a TP-Link Archer C7 v2.

root@foobar:~# cat /etc/openwrt_release
DISTRIB_ID='OpenWrt'
DISTRIB_RELEASE='19.07.2'
DISTRIB_REVISION='r10947-65030d81f3'
DISTRIB_TARGET='ar71xx/generic'
DISTRIB_ARCH='mips_24kc'
DISTRIB_DESCRIPTION='OpenWrt 19.07.2 r10947-65030d81f3'
DISTRIB_TAINTS=''

image