mozilla-services / hindsight

Hindsight - light weight data processing skeleton

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tail not following after reaching EOF

deric opened this issue · comments

I have a simple test hindsight setup with low traffic on Debian 10. tail input should read rotated nginx logs:

filename = "tail.lua"
...
follow = 'name'
input_filename = '/var/log/nginx/rapi_access.log'
ticker_interval = 10
shutdown_on_terminate = true

hindsight reads the nginx log file till the end and then stops:

$ cat cache/hindsight.cp | grep acc
_G['input.rapi_access'] = 28443992

while the file size is constantly being increased:

$ du -b /var/log/nginx/rapi_access.log
28619351        /var/log/nginx/rapi_access.log

From strace output it's apparent that hindsight.cp, utilization.tsv and plugins.tsv are being updated, but hindsight is not noticing any changes in the log file.

The filesystem is ext4

/dev/md2 on / type ext4 (rw,relatime)

timestamps are as well being updated:

  File: /var/log/nginx/rapi_access.log
  Size: 33732698        Blocks: 65896      IO Block: 4096   regular file
Device: 902h/2306d      Inode: 2639823     Links: 1
Access: (0644/-rw-r--r--)  Uid: (   33/www-data)   Gid: (    4/     adm)
Access: 2020-07-27 13:14:05.972009720 +0000
Modify: 2020-07-27 13:14:05.972009720 +0000
Change: 2020-07-27 13:14:05.972009720 +0000
 Birth: -

Any idea what's wrong?

Package versions:

  • libc6 2.28-10
  • hindsight 0.16.0
  • luasandbox-lfs 1.6.8

My best guess is that clearerr() is not being called under the covers and it is stuck at the initial EOF. I will take a look shortly (I should probably just stop using fh:lines and fix it once and for all).

Please try this out: mozilla-services/lua_sandbox_extensions#525
It now requires the gzfile module since I added the tail function there but it allows tail to work with compressed and uncompressed logs.

@deric

@trink Thanks!

I'm getting this error:

2020-07-29T09:33:41     process_message() /usr/share/luasandbox/sandboxes/heka/input/tail.lua:165: bad argument #2 to 'seek' (number expected, got string)

Sorry, that is because the tests don't test the single run old checkpoint migration code, fixed

Test and cleanup finished, merged

I've been testing this for a few days, so far no issues appeared. Thanks!