iovisor / bcc

BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

execsnoop: long timestamps unintentionally merge columns

martinvonwittich opened this issue · comments

I was trying to debug an issue by running execsnoop-bpfcc -t | tee execsnoop.log, and later on using awk to follow the process chains (ie. search for a specific process, get its PPID, search for the parent, get its PPID, ...). This works as long as the timestamps are < 1000s, because here there is whitespace between the first two columns:

971.495 sh               3220534 3220523   0 /bin/sh -c curl -fSs http://localhost:8008/health || exit 1

Unfortunately, as soon as the timestamps grow >1000s, the separating whitespace is gone:

1001.659sh               3220634 3220624   0 /bin/sh -c curl -fSs http://localhost:8008/health || exit 1

This makes it hard to parse the log with tools like awk, because it changes the offsets of all further columns.

From what I can tell from the code, all values are printed with a fixed width, without explicit whitespace between the columns:

bcc/tools/execsnoop.py

Lines 298 to 309 in f798668

if not skip:
if args.time:
printb(b"%-9s" % strftime("%H:%M:%S").encode('ascii'), nl="")
if args.timestamp:
printb(b"%-8.3f" % (time.time() - start_ts), nl="")
if args.print_uid:
printb(b"%-6d" % event.uid, nl="")
ppid = event.ppid if event.ppid > 0 else get_ppid(event.pid)
ppid = b"%d" % ppid if ppid > 0 else b"?"
argv_text = b' '.join(argv[event.pid]).replace(b'\n', b'\\n')
printb(b"%-16s %-7d %-7s %3d %s" % (event.comm, event.pid,
ppid, event.retval, argv_text))

The expectation is apparently that the columns will never take up the complete width, so that the remaining whitespace separates the columns.

I think someone might've just misunderstood the width for the printf %f type in the %-8.3f format string here - maybe the expectation was to reserve 8 characters for the integer part, and an additional 4 characters for the fractional part? As the 8 characters are used for the whole number and the whitespace between the columns, only 8 - 4 (fractional part) - 1 (whitespace) = 3 characters remain for the integer part.

I would suggest:

  • either add explicit whitespace between the columns, regardless of the column width. If the columns weren't fixed width and just separated by \t, users could pipe the output through column -t to get fixed-width formatting.
  • or increase the %f width to 8 + 4 + 1 = 13. That way, the problem would only occur when execsnoop had been running continuously for >3 years.