Problem with file output plugin after upgrade to 4.4.0
piotr-janek opened this issue · comments
As td-agent 4.4.0 is using fluentd 1.15.1, there is a change in file output plugin. The plugin now creates /tmp/fluentd-lock-*
directories. While td-agent is run without --daemon
flag everything works great, the temp dir is created and the app works as expected. But when --daemon
option is set then the temp directory is created but then it is removed and after that the name of that directory is passed to the child processes. That blocks them from functioning properly and makes them throw lots of No such file or directory
errors.
Related strace output
06:31:37.550299 mkdir("/tmp/fluentd-lock-20220818-5506-14yftxz", 0700) = 0
06:31:37.579237 lstat("/tmp/fluentd-lock-20220818-5506-14yftxz", {st_dev=makedev(259, 3), st_ino=25167781, st_mode=S_IFDIR|0700, st_nlink=2, st_uid=995, st_gid=992, st_blksize=4096, st_blocks=0, st_size=6, st_atime=1660804297 /* 2022-08-18T06:31:37.549858004+0000 */, st_atime_nsec=549858004, st_mtime=1660804297 /* 2022-08-18T06:31:37.549858004+0000 */, st_mtime_nsec=549858004, st_ctime=1660804297 /* 2022-08-18T06:31:37.549858004+0000 */, st_ctime_nsec=549858004}) = 0
06:31:37.580017 openat(AT_FDCWD, "/tmp/fluentd-lock-20220818-5506-14yftxz", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 10
06:31:37.582363 rmdir("/tmp/fluentd-lock-20220818-5506-14yftxz") = 0
06:31:37.600303 execve("/opt/td-agent/bin/ruby", ["/opt/td-agent/bin/ruby", "-Eascii-8bit:ascii-8bit", "/opt/td-agent/bin/fluentd", "--log", "/var/log/td-agent/td-agent.log", "--daemon", "/var/run/td-agent/td-agent.pid", "--under-supervisor"], ["FLUENT_PLUGIN=/etc/td-agent/plugin", "GEM_PATH=/opt/td-agent/lib/ruby/gems/2.7.0/", "GEM_HOME=/opt/td-agent/lib/ruby/gems/2.7.0/", "TD_AGENT_LOG_FILE=/var/log/td-agent/td-agent.log", "FLUENT_SOCKET=/var/run/td-agent/td-agent.sock", "LD_PRELOAD=/opt/td-agent/lib/libjemalloc.so", "FLUENT_CONF=/etc/td-agent/td-agent.conf", "XDG_SESSION_ID=c1593", "HOSTNAME=<redacted>", "SHELL=/bin/bash", "TERM=xterm-256color", "HISTSIZE=1000", "USER=td-agent", "LS_COLORS=rs=0:di=38;5;27:ln=38;5;51:mh=44;38;5;15:pi=40;38;5;11:so=38;5;13:do=38;5;5:bd=48;5;232;38;5;11:cd=48;5;232;38;5;3:or=48;5;232;38;5;9:mi=05;48;5;232;38;5;15:su=48;5;196;38;5;15:sg=48;5;11;38;5;16:ca=48;5;196;38;5;226:tw=48;5;10;38;5;16:ow=48;5;10;38;5;21:st=48;5;21;38;5;15:ex=38;5;34:*.tar=38;5;9:*.tgz=38;5;9:*.arc=38;5;9:*.arj=38;5;9:*.taz=38;5;9:*.lha=38;5;9:*.lz4=38;5;9:*.lzh=38;5;9:*.lzma=38;5;9:*.tlz=38;5;9:*.txz=38;5;9:*.tzo=38;5;9:*.t7z=38;5;9:*.zip=38;5;9:*.z=38;5;9:*.Z=38;5;9:*.dz=38;5;9:"..., "MAIL=/var/spool/mail/td-agent", "PATH=/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin", "PWD=/var/lib/td-agent", "LANG=en_US.UTF-8", "HISTCONTROL=ignoredups", "SHLVL=1", "HOME=/var/lib/td-agent", "LOGNAME=td-agent", "LESSOPEN=||/usr/bin/lesspipe.sh %s", "XDG_RUNTIME_DIR=/run/user/995", "_=/opt/td-agent/bin/fluentd", "FLUENTD_LOCK_DIR=/tmp/fluentd-lock-20220818-5506-14yftxz", "SERVERENGINE_SOCKETMANAGER_PATH=/tmp/SERVERENGINE_SOCKETMANAGER_2022-08-18T06:31:37Z_5695", "SERVERENGINE_WORKER_ID=0", "SERVERENGINE_SOCKETMANAGER_INTERNAL_TOKEN=7b50c9427382eebbf950bffb9d6b1809"]) = 0
6:33:06.155735 open("/tmp/fluentd-lock-20220818-5506-14yftxz/fluentd-_var_log_<redacted>_log.lock", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = -1 ENOENT (No such file or directory)
06:33:06.157334 write(5, "2022-08-18 06:33:06 +0000 [warn]: #3 failed to flush the buffer. retry_times=0 next_retry_time=2022-08-18 06:33:07 +0000 chunk=\"5e67e2765ec0c69c43f4717523cd22b1\" error_class=Errno::ENOENT error=\"No such file or directory @ rb_sysopen - /tmp/fluentd-lock-20220818-5506-14yftxz/fluentd-_var_log_<redacted>_log.lock\"\n", 383) = 383
The problem does not occur in ealier td-agent version as it using version of tdagent that does not create this directory.
What kind of information should I attach to make it easier for you to find the solution?
@fujimotos @daipom Could you take a look this?
Sure! I will.
This is my bug. I implemented the tempdir creation as follows:
Evidently se.run
will exit early with --daemon
.
https://github.com/fluent/fluentd/blob/master/lib/fluent/supervisor.rb#L874-L877
Dir.mktmpdir("fluentd-lock-") do |lock_dir|
ENV['FLUENTD_LOCK_DIR'] = lock_dir
se.run
end
So what we probably need is revert fluent/fluentd@75ef92f,
which introduced the automatic cleanup based on the PR feedback.
And this is the fix: fluent/fluentd#3864
I confirmed it now works with --daemon
with the following config:
<system>
workers 3
</system>
<source>
@type dummy
tag test.log
</source>
<match test.**>
@type file
path test.log
append true
<buffer>
@type memory
flush_interval 3s
flush_mode interval
</buffer>
</match>
and by running Fluentd as follows:
$ fluentd --daemon test.pid --log test.log -c test.conf
@ashie Can we go on to release td-agent v4.4.1? Since --daemon
is included
in the default config, so I think we should make a point release for it.
Yea, we should release it ASAP.
In addition, I want to include fluent-plugin-kafka's fix: fluent/fluent-plugin-kafka#466
Fixed by fluent/fluentd#3864. Schedule to be released early next week.
Thanks, I did not expect this to happen so fast. You are awesome.