htop-dev / htop

htop - an interactive process viewer

Home Page:https://htop.dev/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RISC-V crash on launch

jdek opened this issue · comments

Info

Not sure if this is my environment or htop itself, but submitting it anyway since maybe it'll be useful.

Device: CanMV K230 RISC-V 64bit SBC
uname: Linux riscv 5.10.4+ #1 SMP Fri Mar 1 21:48:00 CET 2024 riscv64 GNU/Linux
os: Debian Unstable

Reproduce: htop -> crashes on launch at a high rate.
htop.objdump.tar.gz

Crash:

Error information:
------------------
A signal 6 (Aborted) was received.

Setting information:
--------------------
htop_version=3.3.0;config_reader_min_version=3;fields=0 48 17 18 38 39 40 2 46 47 49 1;hide_kernel_threads=1;hide_userland_threads=0;hide_running_in_container=0;shadow_other_users=0;show_thread_names=0;show_program_path=1;highlight_base_name=0;highlight_deleted_exe=1;shadow_distribution_path_prefix=0;highlight_megabytes=1;highlight_threads=1;highlight_changes=0;highlight_changes_delay_secs=5;find_comm_in_cmdline=1;strip_exe_from_cmdline=1;show_merged_command=0;header_margin=1;screen_tabs=1;detailed_cpu_time=0;cpu_count_from_one=0;show_cpu_usage=1;show_cpu_frequency=0;show_cpu_temperature=0;degree_fahrenheit=0;update_process_names=0;account_guest_in_cpu_meter=0;color_scheme=0;enable_mouse=1;delay=15;hide_function_bar=0;header_layout=two_50_50;column_meters_0=AllCPUs Memory Swap;column_meter_modes_0=1 1 1;column_meters_1=Tasks LoadAverage Uptime;column_meter_modes_1=2 2 2;tree_view=0;sort_key=46;tree_sort_key=0;sort_direction=-1;tree_sort_direction=1;tree_view_always_by_pid=0;all_branches_collapsed=0;screen:Main=PID USER PRIORITY NICE M_VIRT M_RESIDENT M_SHARE STATE PERCENT_CPU PERCENT_MEM TIME Command;.sort_key=PERCENT_CPU;.tree_sort_key=PID;.tree_view_always_by_pid=0;.tree_view=0;.sort_direction=-1;.tree_sort_direction=1;.all_branches_collapsed=0;screen:I/O=PID USER IO_PRIORITY IO_RATE IO_READ_RATE IO_WRITE_RATE PERCENT_SWAP_DELAY PERCENT_IO_DELAY Command;.sort_key=IO_RATE;.tree_sort_key=PID;.tree_view_always_by_pid=0;.tree_view=0;.sort_direction=-1;.tree_sort_direction=1;.all_branches_collapsed=0;

Backtrace information:
----------------------
htop(+0x17474)[0x2ae38df474]
htop(CRT_handleSIGSEGV+0xba)[0x2ae38df6b6]
linux-vdso.so.1(__vdso_rt_sigreturn+0x0)[0x3fd81b2800]
/lib/riscv64-linux-gnu/libc.so.6(+0x6ab9c)[0x3fd7fefb9c]
/lib/riscv64-linux-gnu/libc.so.6(gsignal+0x12)[0x3fd7fbab0e]
/lib/riscv64-linux-gnu/libc.so.6(abort+0xb0)[0x3fd7fab048]

This stack trace seems incomplete, which makes it hard to pin down the exact location of the crash. Can you try to start htop inside gdb and see if you can get a more complete trace for the crash?

Have you tried if this problem also exists when htop is built from the main branch of this repo?

CFLAGS=-Og ./configure --enable-debug it no longer crashes, just ./configure it fails the check at https://github.com/htop-dev/htop/blob/main/XUtils.c#L239, but only sometimes. Honestly this seems more like a compiler, kernel, etc bug than htop.

If that check fails it means, there's a call that causes the snprintf to produce an output larger than the provided buffer. The full stack trace would be nice. ;)

Sure, apologies.

Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=<optimized out>, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:43
43      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=<optimized out>, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:43
#1  0x0000003ff7e3fbd6 in __pthread_kill_internal (signo=<optimized out>, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x0000003ff7e0ab0e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x0000003ff7dfb048 in __GI_abort () at ./stdlib/abort.c:79
#4  0x0000002aaaad1110 in fail () at XUtils.c:28
#5  0x0000002aaaad15de in xSnprintf (buf=<optimized out>, len=255, fmt=<optimized out>) at XUtils.c:240
#6  0x0000002aaaac94e8 in Process_writeField (this=this@entry=0x2aaacac520, str=str@entry=0x3fffffc8c8, field=<optimized out>) at Process.c:666
#7  0x0000002aaaad3f78 in LinuxProcess_rowWriteField (super=0x2aaacac520, str=0x3fffffc8c8, field=<optimized out>) at linux/LinuxProcess.c:333
#8  0x0000002aaaaca308 in Row_display (cast=0x2aaacac520, out=0x3fffffc8c8) at Row.c:68
#9  0x0000002aaaac74da in Panel_draw (this=this@entry=0x2aaaca02c0, force_redraw=force_redraw@entry=true, focus=true, highlightSelected=true, hideFunctionBar=false) at Panel.c:284
#10 0x0000002aaaacc17a in ScreenManager_drawPanels (force_redraw=true, focus=0, this=0x2aaaaf69c0) at ScreenManager.c:215
#11 ScreenManager_run (this=this@entry=0x2aaaaf69c0, lastFocus=lastFocus@entry=0x0, lastKey=lastKey@entry=0x0, name=name@entry=0x0) at ScreenManager.c:251
#12 0x0000002aaaabf7c4 in CommandLine_run (argc=<optimized out>, argv=<optimized out>) at CommandLine.c:407
#13 0x0000003ff7dfb1d4 in __libc_start_call_main (main=main@entry=0x2aaaabbfa0 <main>, argc=argc@entry=1, argv=argv@entry=0x3ffffff4f8) at ../sysdeps/nptl/libc_start_call_main.h:58
#14 0x0000003ff7dfb27c in __libc_start_main_impl (main=0x2aaaabbfa0 <main>, argc=<optimized out>, argv=0x3ffffff4f8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=<optimized out>) at ./csu/libc-start.c:360
#15 0x0000002aaaabbfc4 in _start ()
(gdb)

Updated disassembly htop.objdump.gz

Okay, now this is really strange …

The interesting lines are:

#5  0x0000002aaaad15de in xSnprintf (buf=<optimized out>, len=255, fmt=<optimized out>) at XUtils.c:240
#6  0x0000002aaaac94e8 in Process_writeField (this=this@entry=0x2aaacac520, str=str@entry=0x3fffffc8c8, field=<optimized out>) at Process.c:666
#7  0x0000002aaaad3f78 in LinuxProcess_rowWriteField (super=0x2aaacac520, str=0x3fffffc8c8, field=<optimized out>) at linux/LinuxProcess.c:333

which translates to this location in Process.c:

   case PERCENT_NORM_CPU: {
      float cpuPercentage = this->percent_cpu / host->activeCPUs;
      Row_printPercentage(cpuPercentage, buffer, n, Row_fieldWidths[PERCENT_CPU], &attr); //<<< HERE
      break;
   }

buffer is 256 bytes and n is 255 as shown in the frame from xSnprintf.

The actual call for xSnprintf is in Row_printPercentage:

int Row_printPercentage(float val, char* buffer, size_t n, uint8_t width, int* attr) {
   if (isNonnegative(val)) {
      if (val < 0.05F)
         *attr = CRT_colors[PROCESS_SHADOW];
      else if (val >= 99.9F)
         *attr = CRT_colors[PROCESS_MEGABYTES];

      int precision = 1;

      // Display "val" as "100" for columns like "MEM%".
      if (width == 4 && val > 99.9F) {
         precision = 0;
         val = 100.0F;
      }

      return xSnprintf(buffer, n, "%*.*f ", width, precision, val); // <<< PROBABLY HERE
   }

   *attr = CRT_colors[PROCESS_SHADOW];
   return xSnprintf(buffer, n, "%*.*s ", width, width, "N/A"); // <<< OR HERE (unlikely)
}

Can you check the values width, precision and val before entering that xSnprintf (the first one likely)

val=nan(0x400000), buffer=buffer@entry=0x3fffff9f30 "S ", n=n@entry=255, width=255 '\377', attr=attr@entry=0x3fffff9f2c

Seems to hit the second xSnprintf, are you handling NaN correctly? (I don't know why it's NaN in any case, I'll look into it further a bit later).

Fixes pending in #1421; thanks for the report.

I have one question: How can a CPU percentage field ever show a NaN? Is it because of lack of permission or anything that prevents the percentages from being computed?

It has to be something affecting the whole column. And given we are talking about RISC-V here, it's quite likely that there's something that causes CPU stats to be unavailable (perms, different fields, time base incorrect, etc.) …