fluent / fluent-package-builder

td-agent (Fluentd) Building and Packaging System

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problem with file output plugin after upgrade to 4.4.0

piotr-janek opened this issue · comments

As td-agent 4.4.0 is using fluentd 1.15.1, there is a change in file output plugin. The plugin now creates /tmp/fluentd-lock-* directories. While td-agent is run without --daemon flag everything works great, the temp dir is created and the app works as expected. But when --daemon option is set then the temp directory is created but then it is removed and after that the name of that directory is passed to the child processes. That blocks them from functioning properly and makes them throw lots of No such file or directory errors.

Related strace output

06:31:37.550299 mkdir("/tmp/fluentd-lock-20220818-5506-14yftxz", 0700) = 0
06:31:37.579237 lstat("/tmp/fluentd-lock-20220818-5506-14yftxz", {st_dev=makedev(259, 3), st_ino=25167781, st_mode=S_IFDIR|0700, st_nlink=2, st_uid=995, st_gid=992, st_blksize=4096, st_blocks=0, st_size=6, st_atime=1660804297 /* 2022-08-18T06:31:37.549858004+0000 */, st_atime_nsec=549858004, st_mtime=1660804297 /* 2022-08-18T06:31:37.549858004+0000 */, st_mtime_nsec=549858004, st_ctime=1660804297 /* 2022-08-18T06:31:37.549858004+0000 */, st_ctime_nsec=549858004}) = 0
06:31:37.580017 openat(AT_FDCWD, "/tmp/fluentd-lock-20220818-5506-14yftxz", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 10
06:31:37.582363 rmdir("/tmp/fluentd-lock-20220818-5506-14yftxz") = 0
06:31:37.600303 execve("/opt/td-agent/bin/ruby", ["/opt/td-agent/bin/ruby", "-Eascii-8bit:ascii-8bit", "/opt/td-agent/bin/fluentd", "--log", "/var/log/td-agent/td-agent.log", "--daemon", "/var/run/td-agent/td-agent.pid", "--under-supervisor"], ["FLUENT_PLUGIN=/etc/td-agent/plugin", "GEM_PATH=/opt/td-agent/lib/ruby/gems/2.7.0/", "GEM_HOME=/opt/td-agent/lib/ruby/gems/2.7.0/", "TD_AGENT_LOG_FILE=/var/log/td-agent/td-agent.log", "FLUENT_SOCKET=/var/run/td-agent/td-agent.sock", "LD_PRELOAD=/opt/td-agent/lib/libjemalloc.so", "FLUENT_CONF=/etc/td-agent/td-agent.conf", "XDG_SESSION_ID=c1593", "HOSTNAME=<redacted>", "SHELL=/bin/bash", "TERM=xterm-256color", "HISTSIZE=1000", "USER=td-agent", "LS_COLORS=rs=0:di=38;5;27:ln=38;5;51:mh=44;38;5;15:pi=40;38;5;11:so=38;5;13:do=38;5;5:bd=48;5;232;38;5;11:cd=48;5;232;38;5;3:or=48;5;232;38;5;9:mi=05;48;5;232;38;5;15:su=48;5;196;38;5;15:sg=48;5;11;38;5;16:ca=48;5;196;38;5;226:tw=48;5;10;38;5;16:ow=48;5;10;38;5;21:st=48;5;21;38;5;15:ex=38;5;34:*.tar=38;5;9:*.tgz=38;5;9:*.arc=38;5;9:*.arj=38;5;9:*.taz=38;5;9:*.lha=38;5;9:*.lz4=38;5;9:*.lzh=38;5;9:*.lzma=38;5;9:*.tlz=38;5;9:*.txz=38;5;9:*.tzo=38;5;9:*.t7z=38;5;9:*.zip=38;5;9:*.z=38;5;9:*.Z=38;5;9:*.dz=38;5;9:"..., "MAIL=/var/spool/mail/td-agent", "PATH=/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin", "PWD=/var/lib/td-agent", "LANG=en_US.UTF-8", "HISTCONTROL=ignoredups", "SHLVL=1", "HOME=/var/lib/td-agent", "LOGNAME=td-agent", "LESSOPEN=||/usr/bin/lesspipe.sh %s", "XDG_RUNTIME_DIR=/run/user/995", "_=/opt/td-agent/bin/fluentd", "FLUENTD_LOCK_DIR=/tmp/fluentd-lock-20220818-5506-14yftxz", "SERVERENGINE_SOCKETMANAGER_PATH=/tmp/SERVERENGINE_SOCKETMANAGER_2022-08-18T06:31:37Z_5695", "SERVERENGINE_WORKER_ID=0", "SERVERENGINE_SOCKETMANAGER_INTERNAL_TOKEN=7b50c9427382eebbf950bffb9d6b1809"]) = 0

6:33:06.155735 open("/tmp/fluentd-lock-20220818-5506-14yftxz/fluentd-_var_log_<redacted>_log.lock", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = -1 ENOENT (No such file or directory)
06:33:06.157334 write(5, "2022-08-18 06:33:06 +0000 [warn]: #3 failed to flush the buffer. retry_times=0 next_retry_time=2022-08-18 06:33:07 +0000 chunk=\"5e67e2765ec0c69c43f4717523cd22b1\" error_class=Errno::ENOENT error=\"No such file or directory @ rb_sysopen - /tmp/fluentd-lock-20220818-5506-14yftxz/fluentd-_var_log_<redacted>_log.lock\"\n", 383) = 383

The problem does not occur in ealier td-agent version as it using version of tdagent that does not create this directory.

What kind of information should I attach to make it easier for you to find the solution?

@fujimotos @daipom Could you take a look this?

Sure! I will.

This is my bug. I implemented the tempdir creation as follows:
Evidently se.run will exit early with --daemon.

https://github.com/fluent/fluentd/blob/master/lib/fluent/supervisor.rb#L874-L877

      Dir.mktmpdir("fluentd-lock-") do |lock_dir|
        ENV['FLUENTD_LOCK_DIR'] = lock_dir
        se.run
      end

So what we probably need is revert fluent/fluentd@75ef92f,
which introduced the automatic cleanup based on the PR feedback.

And this is the fix: fluent/fluentd#3864
I confirmed it now works with --daemon with the following config:

<system>
  workers 3
</system>

<source>
  @type dummy
  tag test.log
</source>

<match test.**>
  @type file
  path test.log
  append true
  <buffer>
    @type memory
    flush_interval 3s
    flush_mode interval
  </buffer>
</match>

and by running Fluentd as follows:

$ fluentd --daemon test.pid --log test.log -c test.conf

@ashie Can we go on to release td-agent v4.4.1? Since --daemon is included
in the default config, so I think we should make a point release for it.

ExecStart=/opt/td-agent/bin/fluentd --log $TD_AGENT_LOG_FILE --daemon <%= Shellwords.shellescape("/var/run/#{project_name}/#{project_name}.pid") %> $TD_AGENT_OPTIONS

Yea, we should release it ASAP.
In addition, I want to include fluent-plugin-kafka's fix: fluent/fluent-plugin-kafka#466

Fixed by fluent/fluentd#3864. Schedule to be released early next week.

Thanks, I did not expect this to happen so fast. You are awesome.