rbonghi / jetson_stats

📊 Simple package for monitoring and control your NVIDIA Jetson [Orin, Xavier, Nano, TX] series

Home Page:https://rnext.it/jetson_stats

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

jtop closes inexpectedly when used for docker (error 127)

SolaZn opened this issue · comments

Hello ! Hope that you are good!

Describe the bug

It seems that, for no particular reason, jtop (host side) closes itself and as such, ends up propagating the issue to any docker container that is listening to the /run/jtop.sock file;
In this case, the container will abruptly exit with the following error code :

Exited (127) or Exited (0)
"Error": "failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/run/jtop.sock" to rootfs at "/run/jtop.sock": mount /run/jtop.sock:/run/jtop.sock (via /proc/self/fd/6), flags: 0x5000: not a directory: unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type"

On the host side, there's close to no trace of it happening, other than jtop being unavailable
"jtop is not running on this board" and

To Reproduce

You must be listening to a jtop.sock from inside a Docker container that has /run/jtop.sock mounted as a volume
(as mentioned in the guidelines for Docker use)

Additional context

Add any other context about the problem here.

Board

Output from jetson_release -v:

Software part of jetson-stats 4.2.1 - (c) 2023, Raffaello Bonghi
Model: NVIDIA Jetson Nano Developer Kit - Jetpack 4.6 [L4T 32.6.1]
NV Power Mode[0]: MAXN
Serial Number: [XXX Show with: jetson_release -s XXX]
Hardware:

  • 699-level Part Number: (**)
  • P-Number: p3448-0002
  • BoardIDs: p3448
  • Module: NVIDIA Jetson Nano module (16Gb eMMC)
  • SoC: tegra210
  • CUDA Arch BIN: 5.3
  • Codename: Porg
    Platform:
  • Machine: aarch64
  • System: Linux
  • Distribution: Ubuntu 18.04 Bionic Beaver
  • Release: 4.9.253-tegra
  • Python: 3.6.9
    jtop:
  • Version: 4.2.1
  • Service: Active
    Libraries:
  • CUDA: Not installed
  • cuDNN: 8.2.1.32
  • TensorRT: Not installed
  • VPI: Not installed
  • OpenCV: Not installed

Output when I tried "fix all" on jtop --health:

Traceback (most recent call last):
File "/usr/local/bin/jtop", line 11, in
sys.exit(main())
File "/usr/local/lib/python3.6/dist-packages/jtop/main.py", line 134, in main
jtop_config()
File "/usr/local/lib/python3.6/dist-packages/jtop/jetson_config.py", line 227, in jtop_config
curses.wrapper(JTOPCONFIG, JTOP_MENU)
File "/usr/lib/python3.6/curses/init.py", line 94, in wrapper
return func(stdscr, *args, **kwds)
File "/usr/local/lib/python3.6/dist-packages/jtop/gui/jtopguiconfig.py", line 74, in init
self.loop()
File "/usr/local/lib/python3.6/dist-packages/jtop/gui/jtopguiconfig.py", line 154, in loop
while not self.events():
File "/usr/local/lib/python3.6/dist-packages/jtop/gui/jtopguiconfig.py", line 167, in events
status_keyboard = self.keyboard(event)
File "/usr/local/lib/python3.6/dist-packages/jtop/gui/jtopguiconfig.py", line 201, in keyboard
output = cmd()
File "/usr/local/lib/python3.6/dist-packages/jtop/jetson_config.py", line 79, in fix_jtop_all
fix_service()
File "/usr/local/lib/python3.6/dist-packages/jtop/jetson_config.py", line 69, in fix_service
install_service(folder, copy=copy)
File "/usr/local/lib/python3.6/dist-packages/jtop/service.py", line 124, in install_service
copyfile(service_package_path, service_install_path)
File "/usr/lib/python3.6/shutil.py", line 120, in copyfile
with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.6/dist-packages/services/jtop.service'

Log from jtop.service

jetson@jetson:~$ sudo journalctl -u jtop.service -n 100 --no-pager
-- Logs begin at Tue 2023-05-30 18:32:26 UTC, end at Thu 2023-06-01 16:14:57 UTC. --
Jun 01 13:02:06 jetson-486 jtop[6630]: [INFO] jtop.core.gpu - GPU "gpu" frq in /sys/devices/57000000.gpu/devfreq/57000000.gpu
Jun 01 13:02:06 jetson-486 jtop[6630]: [INFO] jtop.core.processes - Process service started
Jun 01 13:02:06 jetson-486 jtop[6630]: [INFO] jtop.core.memory - Found EMC!
Jun 01 13:02:06 jetson-486 jtop[6630]: [INFO] jtop.core.memory - Memory service started
Jun 01 13:02:06 jetson-486 jtop[6630]: [INFO] jtop.core.engine - Engines found: [APE NVDEC NVENC NVJPG SE VIC]
Jun 01 13:02:06 jetson-486 jtop[6630]: [INFO] jtop.core.temperature - Found thermal "PLL" in thermal_zone3
Jun 01 13:02:06 jetson-486 jtop[6630]: [INFO] jtop.core.temperature - Found thermal "CPU" in thermal_zone1
Jun 01 13:02:06 jetson-486 jtop[6630]: [WARNING] jtop.core.temperature - Skipped PMIC
Jun 01 13:02:06 jetson-486 jtop[6630]: [INFO] jtop.core.temperature - Found thermal "GPU" in thermal_zone2
Jun 01 13:02:06 jetson-486 jtop[6630]: [INFO] jtop.core.temperature - Found thermal "AO" in thermal_zone0
Jun 01 13:02:06 jetson-486 jtop[6630]: [INFO] jtop.core.temperature - Found thermal "thermal" in thermal_zone5
Jun 01 13:02:06 jetson-486 jtop[6630]: [INFO] jtop.core.power - Found I2C power monitor
Jun 01 13:02:06 jetson-486 jtop[6630]: [INFO] jtop.core.fan - Fan tegra_pwmfan(1) found in /sys/class/hwmon/hwmon1
Jun 01 13:02:06 jetson-486 jtop[6630]: [INFO] jtop.core.fan - RPM tegra_pwmfan(1) found in /sys/class/hwmon/hwmon1
Jun 01 13:02:06 jetson-486 jtop[6630]: [INFO] jtop.core.fan - Fan temp controller tegra_pwmfan found in /sys/class/hwmon/hwmon1/temp_control
Jun 01 13:02:06 jetson-486 jtop[6630]: [INFO] jtop.core.jetson_clocks - jetson_clocks found in /usr/bin/jetson_clocks
Jun 01 13:02:06 jetson-486 jtop[6630]: [INFO] jtop.core.nvpmodel - nvpmodel running in [0]MAXN - Default: 0
Jun 01 13:02:06 jetson-486 jtop[6630]: [INFO] jtop.service - Initialization service
Jun 01 13:02:07 jetson-486 jtop[6630]: [INFO] jtop.service - service ready
-- Reboot --
Jun 01 13:33:24 jetson-486 systemd[1]: Started jtop service.
Jun 01 13:33:25 jetson-486 jtop[6632]: [INFO] jtop.service - jetson_stats 4.2.1 - server loaded
Jun 01 13:33:25 jetson-486 jtop[6632]: [INFO] jtop.core.hardware - Hardware detected aarch64
Jun 01 13:33:35 jetson-486 jtop[6632]: [INFO] jtop.core.hardware - NVIDIA Jetson detected L4T=32.6.1
Jun 01 13:33:35 jetson-486 jtop[6632]: [INFO] jtop.service - Running on Python: 3.6.9
Jun 01 13:33:35 jetson-486 jtop[6632]: [INFO] jtop.core.cpu - Found 4 CPU
Jun 01 13:33:35 jetson-486 jtop[6632]: [INFO] jtop.core.gpu - GPU "gpu" status in /sys/devices/57000000.gpu
Jun 01 13:33:35 jetson-486 jtop[6632]: [INFO] jtop.core.gpu - GPU "gpu" frq in /sys/devices/57000000.gpu/devfreq/57000000.gpu
Jun 01 13:33:35 jetson-486 jtop[6632]: [INFO] jtop.core.processes - Process service started
Jun 01 13:33:35 jetson-486 jtop[6632]: [INFO] jtop.core.memory - Found EMC!
Jun 01 13:33:35 jetson-486 jtop[6632]: [INFO] jtop.core.memory - Memory service started
Jun 01 13:33:36 jetson-486 jtop[6632]: [INFO] jtop.core.engine - Engines found: [APE NVDEC NVENC NVJPG SE VIC]
Jun 01 13:33:36 jetson-486 jtop[6632]: [INFO] jtop.core.temperature - Found thermal "PLL" in thermal_zone3
Jun 01 13:33:36 jetson-486 jtop[6632]: [INFO] jtop.core.temperature - Found thermal "CPU" in thermal_zone1
Jun 01 13:33:36 jetson-486 jtop[6632]: [WARNING] jtop.core.temperature - Skipped PMIC
Jun 01 13:33:36 jetson-486 jtop[6632]: [INFO] jtop.core.temperature - Found thermal "GPU" in thermal_zone2
Jun 01 13:33:36 jetson-486 jtop[6632]: [INFO] jtop.core.temperature - Found thermal "AO" in thermal_zone0
Jun 01 13:33:36 jetson-486 jtop[6632]: [INFO] jtop.core.temperature - Found thermal "thermal" in thermal_zone5
Jun 01 13:33:36 jetson-486 jtop[6632]: [INFO] jtop.core.power - Found I2C power monitor
Jun 01 13:33:36 jetson-486 jtop[6632]: [INFO] jtop.core.fan - Fan tegra_pwmfan(1) found in /sys/class/hwmon/hwmon1
Jun 01 13:33:36 jetson-486 jtop[6632]: [INFO] jtop.core.fan - RPM tegra_pwmfan(1) found in /sys/class/hwmon/hwmon1
Jun 01 13:33:36 jetson-486 jtop[6632]: [INFO] jtop.core.fan - Fan temp controller tegra_pwmfan found in /sys/class/hwmon/hwmon1/temp_control
Jun 01 13:33:36 jetson-486 jtop[6632]: [INFO] jtop.core.jetson_clocks - jetson_clocks found in /usr/bin/jetson_clocks
Jun 01 13:33:36 jetson-486 jtop[6632]: [INFO] jtop.core.nvpmodel - nvpmodel running in [0]MAXN - Default: 0
Jun 01 13:33:36 jetson-486 jtop[6632]: [INFO] jtop.service - Initialization service
Jun 01 13:33:36 jetson-486 jtop[6632]: [INFO] jtop.service - service ready
-- Reboot --
Jun 01 13:57:17 jetson-486 systemd[1]: Started jtop service.
Jun 01 13:57:18 jetson-486 jtop[6638]: [INFO] jtop.service - jetson_stats 4.2.1 - server loaded
Jun 01 13:57:18 jetson-486 jtop[6638]: [INFO] jtop.core.hardware - Hardware detected aarch64
Jun 01 13:57:28 jetson-486 jtop[6638]: [INFO] jtop.core.hardware - NVIDIA Jetson detected L4T=32.6.1
Jun 01 13:57:28 jetson-486 jtop[6638]: [INFO] jtop.service - Running on Python: 3.6.9
Jun 01 13:57:28 jetson-486 jtop[6638]: [INFO] jtop.core.cpu - Found 4 CPU
Jun 01 13:57:28 jetson-486 jtop[6638]: [INFO] jtop.core.gpu - GPU "gpu" status in /sys/devices/57000000.gpu
Jun 01 13:57:28 jetson-486 jtop[6638]: [INFO] jtop.core.gpu - GPU "gpu" frq in /sys/devices/57000000.gpu/devfreq/57000000.gpu
Jun 01 13:57:28 jetson-486 jtop[6638]: [INFO] jtop.core.processes - Process service started
Jun 01 13:57:28 jetson-486 jtop[6638]: [INFO] jtop.core.memory - Found EMC!
Jun 01 13:57:28 jetson-486 jtop[6638]: [INFO] jtop.core.memory - Memory service started
Jun 01 13:57:28 jetson-486 jtop[6638]: [INFO] jtop.core.engine - Engines found: [APE NVDEC NVENC NVJPG SE VIC]
Jun 01 13:57:28 jetson-486 jtop[6638]: [INFO] jtop.core.temperature - Found thermal "PLL" in thermal_zone3
Jun 01 13:57:28 jetson-486 jtop[6638]: [INFO] jtop.core.temperature - Found thermal "CPU" in thermal_zone1
Jun 01 13:57:28 jetson-486 jtop[6638]: [WARNING] jtop.core.temperature - Skipped PMIC
Jun 01 13:57:28 jetson-486 jtop[6638]: [INFO] jtop.core.temperature - Found thermal "GPU" in thermal_zone2
Jun 01 13:57:28 jetson-486 jtop[6638]: [INFO] jtop.core.temperature - Found thermal "AO" in thermal_zone0
Jun 01 13:57:28 jetson-486 jtop[6638]: [INFO] jtop.core.temperature - Found thermal "thermal" in thermal_zone5
Jun 01 13:57:28 jetson-486 jtop[6638]: [INFO] jtop.core.power - Found I2C power monitor
Jun 01 13:57:28 jetson-486 jtop[6638]: [INFO] jtop.core.fan - Fan tegra_pwmfan(1) found in /sys/class/hwmon/hwmon1
Jun 01 13:57:28 jetson-486 jtop[6638]: [INFO] jtop.core.fan - RPM tegra_pwmfan(1) found in /sys/class/hwmon/hwmon1
Jun 01 13:57:28 jetson-486 jtop[6638]: [INFO] jtop.core.fan - Fan temp controller tegra_pwmfan found in /sys/class/hwmon/hwmon1/temp_control
Jun 01 13:57:28 jetson-486 jtop[6638]: [INFO] jtop.core.jetson_clocks - jetson_clocks found in /usr/bin/jetson_clocks
Jun 01 13:57:28 jetson-486 jtop[6638]: [INFO] jtop.core.nvpmodel - nvpmodel running in [0]MAXN - Default: 0
Jun 01 13:57:28 jetson-486 jtop[6638]: [INFO] jtop.service - Initialization service
Jun 01 13:57:29 jetson-486 jtop[6638]: [INFO] jtop.service - service ready
-- Reboot --
Jun 01 15:47:58 jetson-486 systemd[1]: Started jtop service.
Jun 01 15:47:58 jetson-486 jtop[6673]: [INFO] jtop.service - jetson_stats 4.2.1 - server loaded
Jun 01 15:47:58 jetson-486 jtop[6673]: [INFO] jtop.core.hardware - Hardware detected aarch64
Jun 01 15:48:09 jetson-486 jtop[6673]: [INFO] jtop.core.hardware - NVIDIA Jetson detected L4T=32.6.1
Jun 01 15:48:09 jetson-486 jtop[6673]: [INFO] jtop.service - Running on Python: 3.6.9
Jun 01 15:48:09 jetson-486 jtop[6673]: [INFO] jtop.core.cpu - Found 4 CPU
Jun 01 15:48:09 jetson-486 jtop[6673]: [INFO] jtop.core.gpu - GPU "gpu" status in /sys/devices/57000000.gpu
Jun 01 15:48:09 jetson-486 jtop[6673]: [INFO] jtop.core.gpu - GPU "gpu" frq in /sys/devices/57000000.gpu/devfreq/57000000.gpu
Jun 01 15:48:09 jetson-486 jtop[6673]: [INFO] jtop.core.processes - Process service started
Jun 01 15:48:09 jetson-486 jtop[6673]: [INFO] jtop.core.memory - Found EMC!
Jun 01 15:48:09 jetson-486 jtop[6673]: [INFO] jtop.core.memory - Memory service started
Jun 01 15:48:09 jetson-486 jtop[6673]: [INFO] jtop.core.engine - Engines found: [APE NVDEC NVENC NVJPG SE VIC]
Jun 01 15:48:09 jetson-486 jtop[6673]: [INFO] jtop.core.temperature - Found thermal "PLL" in thermal_zone3
Jun 01 15:48:09 jetson-486 jtop[6673]: [INFO] jtop.core.temperature - Found thermal "CPU" in thermal_zone1
Jun 01 15:48:09 jetson-486 jtop[6673]: [WARNING] jtop.core.temperature - Skipped PMIC
Jun 01 15:48:09 jetson-486 jtop[6673]: [INFO] jtop.core.temperature - Found thermal "GPU" in thermal_zone2
Jun 01 15:48:09 jetson-486 jtop[6673]: [INFO] jtop.core.temperature - Found thermal "AO" in thermal_zone0
Jun 01 15:48:09 jetson-486 jtop[6673]: [INFO] jtop.core.temperature - Found thermal "thermal" in thermal_zone5
Jun 01 15:48:09 jetson-486 jtop[6673]: [INFO] jtop.core.power - Found I2C power monitor
Jun 01 15:48:09 jetson-486 jtop[6673]: [INFO] jtop.core.fan - Fan tegra_pwmfan(1) found in /sys/class/hwmon/hwmon1
Jun 01 15:48:09 jetson-486 jtop[6673]: [INFO] jtop.core.fan - RPM tegra_pwmfan(1) found in /sys/class/hwmon/hwmon1
Jun 01 15:48:09 jetson-486 jtop[6673]: [INFO] jtop.core.fan - Fan temp controller tegra_pwmfan found in /sys/class/hwmon/hwmon1/temp_control
Jun 01 15:48:09 jetson-486 jtop[6673]: [INFO] jtop.core.jetson_clocks - jetson_clocks found in /usr/bin/jetson_clocks
Jun 01 15:48:09 jetson-486 jtop[6673]: [INFO] jtop.core.nvpmodel - nvpmodel running in [0]MAXN - Default: 0
Jun 01 15:48:09 jetson-486 jtop[6673]: [INFO] jtop.service - Initialization service
Jun 01 15:48:09 jetson-486 jtop[6673]: [INFO] jtop.service - service ready
Jun 01 15:56:41 jetson-486 jtop[6673]: [INFO] jtop.service - jtop timer thread started 1000ms
Jun 01 16:10:42 jetson-486 jtop[6673]: [INFO] jtop.service - jtop timer thread close
Jun 01 16:10:50 jetson-486 jtop[6673]: [INFO] jtop.service - jtop timer thread started 1000ms

I also had this very weird problem where the jtop.service would progressively not survive powercycling : on the first few reboots, it would be fine but around the 3rd reboot, i would get this message from systemd :

warning: the unit file, source configuration file or drop-ins of jtop.service changed on disk. run 'systemctl daemon-reload' to reload units.

When this happens, the jtop.service file would have disappeared and then the module would not reload at next startup (because the file is missing).

Will provide journalctl logs for when it happens.

Thank you !

I set the --restart=always in docker run args,and when i reboot my ubuntu ,the container can not restart because of the same reason

@hanyuImg I have the same issue. Was you able to find any workaround for it?