many CLOSE_WAITs in eventloop (HTTP input)

Question

many CLOSE_WAITs in eventloop (HTTP input)

leongshengmin opened this issue 2 months ago · comments

Describe the bug

Sudden spike in CLOSE_WAIT leading to high CPU usage (~100%). Seeing event throughput drop significantly due to CPU load. Fluentd instances sit behind azure application gateway and should receive roughly the same amount of traffic.
Even after restarting fluentd, CPU usage remains high.

lsof | grep TCP | grep td-agent | grep CLOSE | wc -l
978

Compared to host with normal cpu load

lsof | grep TCP | grep td-agent | grep CLOSE | wc -l
0

sigdump-87352.log

To Reproduce

Not sure how to reproduce

Expected behavior

CPU usage and CLOSE_WAIT should be a lot lower.

Your Environment

- Fluentd version: 1.12.1
- TD Agent version: 4.1.0
- Operating system: Ubuntu 20.04.6 LTS
- Kernel version: 5.15.0-1064-azure

Your Configuration

<system>
  workers 2
  root_dir /mnt/fluentd
</system>
<source>
  @id http_source
  @type http
  bind 0.0.0.0
  port 9880
  body_size_limit 512kB
  keepalive_timeout 45s
  @log_level info
  <parse>
    time_key __time
  </parse>
</source>

<source>
  @type monitor_agent
  bind 0.0.0.0
  port 24220
  emit_interval 5
  include_config false
</source>

<filter xxx>
  @type record_transformer
  enable_ruby
  auto_typecast true
  <record>
    fluentd_timestamp ${ require 'time'; Time.now.utc.iso8601(6) }
  </record>
</filter>

<match xxx>
  @id xxx
  @type http
  @log_level info

  endpoint xxx
  json_array true

  # data type settings
  <format>
    @type json
  </format>

  # buffer
  <buffer>
    @type file
    total_limit_size 4GB
    flush_thread_count 12
    chunk_limit_size 1MB
    flush_interval 10s
    overflow_action drop_oldest_chunk
        retry_max_times 3
      </buffer>

</match>

Your Error Log

no error logs

Additional context

No response

Leong Shengmin · Answer 1 · Wed Jun 05 2024 14:24:21 GMT+0800 (China Standard Time)

closing issue as was due to LB settings.