tagomoris / fluent-plugin-secure-forward

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problem with dead connection

doul59 opened this issue · comments

Hi,

I have a problem with the plugin :

Forwarder :

2015-04-07 17:37:25 +0200 [info]: starting fluentd-0.12.7
2015-04-07 17:37:26 +0200 [info]: gem 'fluent-mixin-config-placeholders' version '0.3.0'
2015-04-07 17:37:26 +0200 [info]: gem 'fluent-mixin-plaintextformatter' version '0.2.6'
2015-04-07 17:37:26 +0200 [info]: gem 'fluent-plugin-mongo' version '0.7.8'
2015-04-07 17:37:26 +0200 [info]: gem 'fluent-plugin-parser' version '0.4.1'
2015-04-07 17:37:26 +0200 [info]: gem 'fluent-plugin-record-reformer' version '0.5.0'
2015-04-07 17:37:26 +0200 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '1.5.1'
2015-04-07 17:37:26 +0200 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '1.4.1'
2015-04-07 17:37:26 +0200 [info]: gem 'fluent-plugin-s3' version '0.5.7'
2015-04-07 17:37:26 +0200 [info]: gem 'fluent-plugin-scribe' version '0.10.14'
2015-04-07 17:37:26 +0200 [info]: gem 'fluent-plugin-secure-forward' version '0.2.6'
2015-04-07 17:37:26 +0200 [info]: gem 'fluent-plugin-td' version '0.10.26'
2015-04-07 17:37:26 +0200 [info]: gem 'fluent-plugin-td-monitoring' version '0.2.0'
2015-04-07 17:37:26 +0200 [info]: gem 'fluent-plugin-webhdfs' version '0.4.1'
2015-04-07 17:37:26 +0200 [info]: gem 'fluentd' version '0.12.7'
2015-04-07 17:37:26 +0200 [info]: using configuration file: <ROOT>
  <source>
    type tail
    path /var/log/syslog
    pos_file /var/log/td-agent/syslog.pos
    format none
    tag syslog
  </source>
  <match reformered.**>
    type secure_forward
    self_hostname ${hostname}
    shared_key hogeposxxx0
    keepalive 300
    buffer_type memory
    buffer_chunk_limit 256m
    buffer_queue_limit 128
    flush_interval 5s
    disable_retry_limit true
    retry_wait 30s
    <server>
      host aggregator.toto.lan
      port 282525
    </server>
  </match>
  <match **>
    type record_reformer
    hostname ${hostname}
    tag reformered.${tag}
  </match>
</ROOT>
2015-04-07 17:37:26 +0200 [info]: adding match pattern="reformered.**" type="secure_forward"
2015-04-07 17:37:11 +0200 [info]: adding match pattern="**" type="record_reformer"
2015-04-07 17:37:12 +0200 [info]: adding source type="tail"
2015-04-07 17:37:12 +0200 [info]: following tail of /var/log/syslog
2015-04-07 17:37:17 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:37:18 +0200 [warn]: temporarily failed to flush the buffer. next_retry=2015-04-07 17:37:50 +0200 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:3b111da080d4"
  2015-04-07 17:37:18 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.2.6/lib/fluent/plugin/out_secure_forward.rb:216:in `write_objects'
  2015-04-07 17:37:18 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:463:in `write'
  2015-04-07 17:37:18 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/buffer.rb:325:in `write_chunk'
  2015-04-07 17:37:18 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/buffer.rb:304:in `pop'
  2015-04-07 17:37:18 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:321:in `try_flush'
  2015-04-07 17:37:18 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:140:in `run
2015-04-07 17:37:32 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:37:47 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:37:51 +0200 [warn]: temporarily failed to flush the buffer. next_retry=2015-04-07 17:38:53 +0200 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:3b111da080d4"
  2015-04-07 17:37:51 +0200 [warn]: suppressed same stacktrace
2015-04-07 17:38:02 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:38:17 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:38:32 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:38:47 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:38:54 +0200 [warn]: temporarily failed to flush the buffer. next_retry=2015-04-07 17:40:40 +0200 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:3b111da080d4"
  2015-04-07 17:38:54 +0200 [warn]: suppressed same stacktrace
2015-04-07 17:39:02 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:39:17 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:39:32 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:39:47 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:40:02 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:40:17 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:40:32 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-07 17:40:41 +0200 [warn]: temporarily failed to flush the buffer. next_retry=2015-04-07 17:44:55 +0200 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:3b111da080d4"

Aggregator :

2015-04-07 17:33:11 +0200 [info]: starting fluentd-0.12.7
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-mixin-config-placeholders' version '0.3.0'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-mixin-plaintextformatter' version '0.2.6'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-mixin-rewrite-tag-name' version '0.1.0'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-plugin-elasticsearch' version '0.7.0'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-plugin-mongo' version '0.7.8'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-plugin-mysqlslowquerylog' version '0.0.2'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-plugin-parser' version '0.4.1'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-plugin-record-reformer' version '0.5.0'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '1.5.1'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '1.4.1'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-plugin-s3' version '0.5.7'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-plugin-scribe' version '0.10.14'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-plugin-secure-forward' version '0.2.6'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-plugin-td' version '0.10.26'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-plugin-td-monitoring' version '0.2.0'
2015-04-07 17:33:11 +0200 [info]: gem 'fluent-plugin-webhdfs' version '0.4.1'
2015-04-07 17:33:11 +0200 [info]: gem 'fluentd' version '0.12.7'
2015-04-07 17:33:11 +0200 [info]: using configuration file: <ROOT>
  <source>
    type secure_forward
    self_hostname ${hostname}
    shared_key hogeposxxx0
    cert_auto_generate yes
    port 282525
  </source>
  <match fluent.**>
    type record_reformer
    hostname ${hostname}
    tag reformered.${tag}
  </match>
  <match reformered.**>
    type record_reformer
    tag ${tag_suffix[1]}
  </match>
  <match **>
    type roundrobin
    <store>
      type elasticsearch
      host x.x.x.x
      port 9200
      logstash_format true
      logstash_prefix ${tag}
      include_tag_key true
      tag_key tag
      flush_interval 1s
    </store>
    <store>
      type elasticsearch
      host x.x.x.x
      port 9200
      logstash_format true
      logstash_prefix ${tag}
      include_tag_key true
      tag_key tag
      flush_interval 1s
    </store>
  </match>
</ROOT>

Can you help me ?

@doul59 Which td-agent version are you using?
Does this problem occurs always (now and in past)? Or from any point of time?

From your log, connection established to xxx is missing.
If plugin works correctly, above message should be logged:

log.info "connection established to #{@host}" if @first_session

Can your forwarder connect to aggregator?
Does this happen with only secure-forward plugin?

I have try 2.1.4 and upgrade this afternoon in 2.2.0
if i try without secure it's ok ...

@doul59 do you mean that everything works on td-agent 2.1.4?

no i have same problem with 2.1.4

Do you configure port 282525?
Under IP protocol, max number of available port is 65534 ...

yes it's a mistake on my copy/paste 👎
i use 28252 but if i don't specify port in config it's same result

on aggregator :
netstat -anope

tcp 517 0 ip_aggreagator:28252 ip_forwarder:58854 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggreagator:28252 ip_forwarder:58829 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggreagator:28252 ip_forwarder:58831 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggreagator:28252 ip_forwarder:58851 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggreagator:28252 ip_forwarder:58841 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggreagator:28252 ip_forwarder:58855 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggreagator:28252 ip_forwarder:58850 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggreagator:28252 ip_forwarder:58852 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggreagator:28252 ip_forwarder:58835 ESTABLISHED 0 0 - off (0.00/0/0)

no i have same problem with 2.1.4

It means 2.1.4 doesn't have this issue?
Could you see the differences between 2.1.4 and 2.2.0, like network state or something?

i had exactly same issue yesterday with 2.1.4, i update in 2.2.0 but nothing change ...
maybe version of ruby is a problem ?

dpkg -l | grep ruby
ii ruby 1:1.9.3 all Interpreter of object-oriented scripting language Ruby (default version)

2015-04-08 09:07:25 +0200 [warn]: temporarily failed to flush the buffer. next_retry=2015-04-08 09:07:52 +0200 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:35f3a4dc7ecc"
2015-04-08 09:07:25 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.2.6/lib/fluent/plugin/out_secure_forward.rb:216:in write_objects' 2015-04-08 09:07:25 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:463:inwrite'
2015-04-08 09:07:25 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/buffer.rb:325:in write_chunk' 2015-04-08 09:07:25 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/buffer.rb:304:inpop'
2015-04-08 09:07:25 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:321:in try_flush' 2015-04-08 09:07:25 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:140:inrun'

Hmm.. secure-forward works with ruby 2.1.0 or later on some environment so
I hope ruby version is not related.

i try with an other server // same configuration
Without secure-forward : it's ok
With secure-forward same issue

'Without secure-forward' means you use normal forward plugin?
You seem to use Ubuntu so could you check secure-forward plugin works or not with td-agent 1?

http://docs.fluentd.org/articles/install-by-deb-v1

my servers is on debian wheezy

i have done a new test :
Stop td-agent on forwarder + Stop td-agent on aggregator
Start agent on aggregator
wait 2 min
Start agent on forwarder

forwarder send to aggregator few lines of log and after X minutes forwarder is broken

2015-04-08 17:20:29 +0200 [info]: following tail of /var/log/syslog
2015-04-08 17:20:30 +0200 [info]: connection established to aggregator.toto.lan
2015-04-08 17:21:31 +0200 [warn]: disconnected from aggregator.toto.lan
2015-04-08 17:21:34 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-08 17:21:39 +0200 [warn]: recovered connection to dead node: aggregator.toto.lan
2015-04-08 17:22:32 +0200 [warn]: disconnected from aggregator.toto.lan
2015-04-08 17:22:34 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-08 17:22:49 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-08 17:23:04 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-08 17:23:19 +0200 [warn]: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-08 17:23:32 +0200 [warn]: temporarily failed to flush the buffer. next_retry=2015-04-08 17:23:32 +0200 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:35357e008488"
2015-04-08 17:23:32 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.2.6/lib/fluent/plugin/out_secure_forward.rb:216:in write_objects' 2015-04-08 17:23:32 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:463:inwrite'
2015-04-08 17:23:32 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/buffer.rb:325:in write_chunk' 2015-04-08 17:23:32 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/buffer.rb:304:inpop'
2015-04-08 17:23:32 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:321:in try_flush' 2015-04-08 17:23:32 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:140:inrun'
2015-04-08 17:23:33 +0200 [warn]: temporarily failed to flush the buffer. next_retry=2015-04-08 17:23:34 +0200 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:35357e008488"

on the aggregator :

tcp 517 0 ip_aggregator:28252 ip_forwarder:60614 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 38 0 ip_aggregator:28252 ip_forwarder:60613 CLOSE_WAIT 110 28593 6067/ruby off (0.00/0/0)
tcp 517 0 ip_aggregator:28252 ip_forwarder:60622 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggregator:28252 ip_forwarder:60626 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggregator:28252 ip_forwarder:60630 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggregator:28252 ip_forwarder:60628 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggregator:28252 ip_forwarder:60627 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggregator:28252 ip_forwarder:60635 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggregator:28252 ip_forwarder:60617 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggregator:28252 ip_forwarder:60629 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggregator:28252 ip_forwarder:60633 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggregator:28252 ip_forwarder:60620 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggregator:28252 ip_forwarder:60634 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggregator:28252 ip_forwarder:60631 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggregator:28252 ip_forwarder:60632 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggregator:28252 ip_forwarder:60619 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggregator:28252 ip_forwarder:60615 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggregator:28252 ip_forwarder:60625 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggregator:28252 ip_forwarder:60621 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggregator:28252 ip_forwarder:60618 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggregator:28252 ip_forwarder:60616 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggregator:28252 ip_forwarder:60624 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggregator:28252 ip_forwarder:60623 ESTABLISHED 0 0 - off (0.00/0/0)
tcp 517 0 ip_aggregator:28252 ip_forwarder:60636 ESTABLISHED 0 0 - off (0.00/0/0)

in debug mode :

aggregator :
2015-04-08 17:37:09 +0200 [info]: fluent/agent.rb:123:add_match: adding match pattern="fluent." type="record_reformer"
2015-04-08 17:37:09 +0200 [trace]: fluent/plugin.rb:98:register_impl: registered output plugin 'record_reformer'
2015-04-08 17:37:09 +0200 [info]: fluent/agent.rb:123:add_match: adding match pattern="reformered.
" type="record_reformer"
2015-04-08 17:37:09 +0200 [info]: fluent/agent.rb:123:add_match: adding match pattern="**" type="roundrobin"
2015-04-08 17:37:09 +0200 [debug]: plugin/out_roundrobin.rb:44:block in configure: adding store type="elasticsearch", weight=1
2015-04-08 17:37:09 +0200 [debug]: plugin/out_roundrobin.rb:44:block in configure: adding store type="elasticsearch", weight=1
2015-04-08 17:37:09 +0200 [info]: fluent/root_agent.rb:142:add_source: adding source type="secure_forward"
2015-04-08 17:37:09 +0200 [trace]: fluent/plugin.rb:98:register_impl: registered input plugin 'secure_forward'
2015-04-08 17:37:09 +0200 [trace]: plugin/in_secure_forward.rb:179:run: setup for ssl sessions
2015-04-08 17:37:09 +0200 [trace]: plugin/in_secure_forward.rb:185:run: start to listen bind="0.0.0.0" port=28252
2015-04-08 17:37:09 +0200 [trace]: plugin/in_secure_forward.rb:187:run: starting SSL server bind="0.0.0.0" port=28252
2015-04-08 17:37:09 +0200 [trace]: plugin/in_secure_forward.rb:191:run: accepting sessions
2015-04-08 17:38:14 +0200 [trace]: plugin/in_secure_forward.rb:194:block in run: accept tcp connection (ssl session not established yet)
2015-04-08 17:38:14 +0200 [debug]: plugin/input_session.rb:148:start: starting server
2015-04-08 17:38:14 +0200 [trace]: plugin/in_secure_forward.rb:199:block in run: session instances: all=1 closed=0
2015-04-08 17:38:14 +0200 [trace]: plugin/input_session.rb:150:start: accepting ssl session
2015-04-08 17:38:14 +0200 [debug]: plugin/input_session.rb:63:generate_helo: generating helo
2015-04-08 17:38:15 +0200 [debug]: plugin/input_session.rb:122:on_read: on_read
2015-04-08 17:38:15 +0200 [debug]: plugin/input_session.rb:69:check_ping: checking ping
2015-04-08 17:38:15 +0200 [debug]: plugin/input_session.rb:105:generate_pong: generating pong
2015-04-08 17:38:15 +0200 [debug]: plugin/input_session.rb:137:on_read: connection established
2015-04-08 17:39:15 +0200 [debug]: plugin/input_session.rb:122:on_read: on_read
2015-04-08 17:39:16 +0200 [debug]: plugin/input_session.rb:122:on_read: on_read
2015-04-08 17:39:19 +0200 [trace]: plugin/in_secure_forward.rb:194:block in run: accept tcp connection (ssl session not established yet)
2015-04-08 17:39:19 +0200 [debug]: plugin/input_session.rb:148:start: starting server
2015-04-08 17:39:19 +0200 [trace]: plugin/in_secure_forward.rb:199:block in run: session instances: all=1 closed=0
2015-04-08 17:39:19 +0200 [trace]: plugin/input_session.rb:150:start: accepting ssl session
2015-04-08 17:39:19 +0200 [debug]: plugin/input_session.rb:63:generate_helo: generating helo
2015-04-08 17:39:20 +0200 [debug]: plugin/input_session.rb:122:on_read: on_read
2015-04-08 17:39:20 +0200 [debug]: plugin/input_session.rb:69:check_ping: checking ping
2015-04-08 17:39:20 +0200 [debug]: plugin/input_session.rb:105:generate_pong: generating pong
2015-04-08 17:39:20 +0200 [debug]: plugin/input_session.rb:137:on_read: connection established
2015-04-08 17:39:21 +0200 [info]: plugin/out_elasticsearch.rb:67:client: Connection opened to Elasticsearch cluster => {:host=>"ip-es1", :port=>9200, :scheme=>"http"}
2015-04-08 17:39:21 +0200 [info]: plugin/out_elasticsearch.rb:67:client: Connection opened to Elasticsearch cluster => {:host=>"ip-es2", :port=>9200, :scheme=>"http"}

forwarder :
2015-04-08 17:38:14 +0200 [info]: fluent/agent.rb:123:add_match: adding match pattern="reformered." type="secure_forward"
2015-04-08 17:38:14 +0200 [trace]: fluent/plugin.rb:98:register_impl: registered output plugin 'secure_forward'
2015-04-08 17:38:14 +0200 [info]: fluent/agent.rb:123:add_match: adding match pattern="
" type="record_reformer"
2015-04-08 17:38:14 +0200 [trace]: fluent/plugin.rb:98:register_impl: registered output plugin 'record_reformer'
2015-04-08 17:38:14 +0200 [info]: fluent/root_agent.rb:142:add_source: adding source type="tail"
2015-04-08 17:38:14 +0200 [debug]: plugin/out_secure_forward.rb:121:start: starting secure-forward
2015-04-08 17:38:14 +0200 [debug]: plugin/out_secure_forward.rb:123:start: start to connect target nodes
2015-04-08 17:38:14 +0200 [debug]: plugin/out_secure_forward.rb:125:block in start: connecting node host="aggregator.toto.lan" port=28252
2015-04-08 17:38:14 +0200 [info]: plugin/in_tail.rb:477:initialize: following tail of /var/log/syslog
2015-04-08 17:38:14 +0200 [debug]: plugin/output_node.rb:212:connect: starting client
2015-04-08 17:38:14 +0200 [debug]: plugin/output_node.rb:215:connect: create tcp socket to node host="aggregator.toto.lan" address="192.168.18.81" port=28252
2015-04-08 17:38:14 +0200 [trace]: plugin/output_node.rb:224:connect: changing socket options
2015-04-08 17:38:14 +0200 [trace]: plugin/output_node.rb:232:connect: initializing SSL contexts
2015-04-08 17:38:14 +0200 [debug]: plugin/output_node.rb:237:connect: trying to connect ssl session host="aggregator.toto.lan" address="192.168.18.81" port=28252
2015-04-08 17:38:14 +0200 [debug]: plugin/output_node.rb:249:connect: ssl session connected host="aggregator.toto.lan" port=28252
2015-04-08 17:38:14 +0200 [debug]: plugin/output_node.rb:269:connect: ssl sessison connected host="aggregator.toto.lan" port=28252
2015-04-08 17:38:14 +0200 [debug]: plugin/output_node.rb:181:on_read: on_read
2015-04-08 17:38:14 +0200 [debug]: plugin/output_node.rb:129:check_helo: checking helo
2015-04-08 17:38:14 +0200 [debug]: plugin/output_node.rb:141:generate_ping: generating ping
2015-04-08 17:38:15 +0200 [debug]: plugin/output_node.rb:181:on_read: on_read
2015-04-08 17:38:15 +0200 [debug]: plugin/output_node.rb:156:check_pong: checking pong
2015-04-08 17:38:15 +0200 [info]: plugin/output_node.rb:204:on_read: connection established to aggregator.toto.lan
2015-04-08 17:38:15 +0200 [debug]: plugin/output_node.rb:207:on_read: connection established host="aggregator.toto.lan" port=28252 expire=nil
2015-04-08 17:38:19 +0200 [trace]: plugin/out_secure_forward.rb:139:block in node_watcher: in node health watcher
2015-04-08 17:38:19 +0200 [trace]: plugin/out_secure_forward.rb:142:block (2 levels) in node_watcher: node health watcher for aggregator.toto.lan
2015-04-08 17:38:24 +0200 [trace]: plugin/out_secure_forward.rb:139:block in node_watcher: in node health watcher
2015-04-08 17:38:24 +0200 [trace]: plugin/out_secure_forward.rb:142:block (2 levels) in node_watcher: node health watcher for aggregator.toto.lan
2015-04-08 17:38:29 +0200 [trace]: plugin/out_secure_forward.rb:139:block in node_watcher: in node health watcher
2015-04-08 17:38:29 +0200 [trace]: plugin/out_secure_forward.rb:142:block (2 levels) in node_watcher: node health watcher for aggregator.toto.lan
2015-04-08 17:38:34 +0200 [trace]: plugin/out_secure_forward.rb:139:block in node_watcher: in node health watcher
2015-04-08 17:38:34 +0200 [trace]: plugin/out_secure_forward.rb:142:block (2 levels) in node_watcher: node health watcher for aggregator.toto.lan
2015-04-08 17:38:39 +0200 [trace]: plugin/out_secure_forward.rb:139:block in node_watcher: in node health watcher
2015-04-08 17:38:39 +0200 [trace]: plugin/out_secure_forward.rb:142:block (2 levels) in node_watcher: node health watcher for aggregator.toto.lan
2015-04-08 17:38:44 +0200 [trace]: plugin/out_secure_forward.rb:139:block in node_watcher: in node health watcher
2015-04-08 17:38:44 +0200 [trace]: plugin/out_secure_forward.rb:142:block (2 levels) in node_watcher: node health watcher for aggregator.toto.lan
2015-04-08 17:38:49 +0200 [trace]: plugin/out_secure_forward.rb:139:block in node_watcher: in node health watcher
2015-04-08 17:38:49 +0200 [trace]: plugin/out_secure_forward.rb:142:block (2 levels) in node_watcher: node health watcher for aggregator.toto.lan
2015-04-08 17:38:54 +0200 [trace]: plugin/out_secure_forward.rb:139:block in node_watcher: in node health watcher
2015-04-08 17:38:54 +0200 [trace]: plugin/out_secure_forward.rb:142:block (2 levels) in node_watcher: node health watcher for aggregator.toto.lan
2015-04-08 17:38:59 +0200 [trace]: plugin/out_secure_forward.rb:139:block in node_watcher: in node health watcher
2015-04-08 17:38:59 +0200 [trace]: plugin/out_secure_forward.rb:142:block (2 levels) in node_watcher: node health watcher for aggregator.toto.lan
2015-04-08 17:39:04 +0200 [trace]: plugin/out_secure_forward.rb:139:block in node_watcher: in node health watcher
2015-04-08 17:39:04 +0200 [trace]: plugin/out_secure_forward.rb:142:block (2 levels) in node_watcher: node health watcher for aggregator.toto.lan
2015-04-08 17:39:09 +0200 [trace]: plugin/out_secure_forward.rb:139:block in node_watcher: in node health watcher
2015-04-08 17:39:09 +0200 [trace]: plugin/out_secure_forward.rb:142:block (2 levels) in node_watcher: node health watcher for aggregator.toto.lan
2015-04-08 17:39:14 +0200 [trace]: plugin/out_secure_forward.rb:139:block in node_watcher: in node health watcher
2015-04-08 17:39:14 +0200 [trace]: plugin/out_secure_forward.rb:142:block (2 levels) in node_watcher: node health watcher for aggregator.toto.lan
2015-04-08 17:39:15 +0200 [trace]: plugin/out_secure_forward.rb:218:write_objects: selected node host="aggregator.toto.lan" port=28252 standby=false
2015-04-08 17:39:16 +0200 [trace]: plugin/out_secure_forward.rb:218:write_objects: selected node host="aggregator.toto.lan" port=28252 standby=false
2015-04-08 17:39:16 +0200 [warn]: plugin/output_node.rb:294:rescue in block in connect: disconnected from aggregator.toto.lan
2015-04-08 17:39:16 +0200 [debug]: plugin/output_node.rb:89:shutdown: shutting down node aggregator.toto.lan
2015-04-08 17:39:17 +0200 [warn]: fluent/output.rb:354:rescue in try_flush: temporarily failed to flush the buffer. next_retry=2015-04-08 17:39:18 +0200 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:335bbe846424"
2015-04-08 17:39:17 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.2.6/lib/fluent/plugin/out_secure_forward.rb:216:in write_objects' 2015-04-08 17:39:17 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:463:inwrite'
2015-04-08 17:39:17 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/buffer.rb:325:in write_chunk' 2015-04-08 17:39:17 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/buffer.rb:304:inpop'
2015-04-08 17:39:17 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:321:in try_flush' 2015-04-08 17:39:17 +0200 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.7/lib/fluent/output.rb:140:inrun'
2015-04-08 17:39:19 +0200 [warn]: fluent/output.rb:354:rescue in try_flush: temporarily failed to flush the buffer. next_retry=2015-04-08 17:39:19 +0200 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:335bbe846424"
2015-04-08 17:39:19 +0200 [warn]: suppressed same stacktrace
2015-04-08 17:39:19 +0200 [trace]: plugin/out_secure_forward.rb:139:block in node_watcher: in node health watcher
2015-04-08 17:39:19 +0200 [trace]: plugin/out_secure_forward.rb:142:block (2 levels) in node_watcher: node health watcher for aggregator.toto.lan
2015-04-08 17:39:19 +0200 [warn]: plugin/out_secure_forward.rb:151:block (2 levels) in node_watcher: dead connection found: aggregator.toto.lan, reconnecting...
2015-04-08 17:39:19 +0200 [debug]: plugin/out_secure_forward.rb:156:block (2 levels) in node_watcher: reconnecting to node host="aggregator.toto.lan" port=28252 expire=nil expired=false detached=false
2015-04-08 17:39:19 +0200 [debug]: plugin/output_node.rb:212:connect: starting client
2015-04-08 17:39:19 +0200 [trace]: plugin/out_secure_forward.rb:168:block (2 levels) in node_watcher: checking reconnecting node aggregator.toto.lan
2015-04-08 17:39:19 +0200 [debug]: plugin/output_node.rb:215:connect: create tcp socket to node host="aggregator.toto.lan" address="192.168.18.81" port=28252
2015-04-08 17:39:19 +0200 [trace]: plugin/output_node.rb:224:connect: changing socket options
2015-04-08 17:39:19 +0200 [trace]: plugin/output_node.rb:232:connect: initializing SSL contexts
2015-04-08 17:39:19 +0200 [debug]: plugin/output_node.rb:237:connect: trying to connect ssl session host="aggregator.toto.lan" address="192.168.18.81" port=28252
2015-04-08 17:39:19 +0200 [debug]: plugin/output_node.rb:249:connect: ssl session connected host="aggregator.toto.lan" port=28252
2015-04-08 17:39:19 +0200 [debug]: plugin/output_node.rb:269:connect: ssl sessison connected host="aggregator.toto.lan" port=28252
2015-04-08 17:39:19 +0200 [debug]: plugin/output_node.rb:181:on_read: on_read
2015-04-08 17:39:19 +0200 [debug]: plugin/output_node.rb:129:check_helo: checking helo
2015-04-08 17:39:19 +0200 [debug]: plugin/output_node.rb:141:generate_ping: generating ping
2015-04-08 17:39:20 +0200 [debug]: plugin/output_node.rb:181:on_read: on_read
2015-04-08 17:39:20 +0200 [debug]: plugin/output_node.rb:156:check_pong: checking pong
2015-04-08 17:39:20 +0200 [debug]: plugin/output_node.rb:207:on_read: connection established host="aggregator.toto.lan" port=28252 expire=nil
2015-04-08 17:39:20 +0200 [warn]: fluent/output.rb:354:rescue in try_flush: temporarily failed to flush the buffer. next_retry=2015-04-08 17:39:24 +0200 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:335bbe846424"

How about to configure forwarder without keepalive 300? (it means keep connection alive forever.)

Hmm...
I tried Debian 7 with td-agent 2.2.0 but there is no problem in localhost test.

tcp        0      0 127.0.0.1:28252         127.0.0.1:50370         ESTABLISHED 0          27790       -                off (0.00/0/0)
tcp        0      0 127.0.0.1:50370         127.0.0.1:28252         ESTABLISHED 0          27789       -                off (0.00/0/0)

From your netstat, there are lots of 517 Recv-Q and 0 inode connections.
This is weird...

tcp 38 0 ip_aggregator:28252 ip_forwarder:60613 CLOSE_WAIT 110 28593 6067/ruby off (0.00/0/0)

What is this program? 6067/ruby is your test program?

without keepalive : same issue

in this exemple : 6067/ruby is td-agent process

i had reinstall all plateform // i have same issue without secure-forward :/

Same issue without secure-forward?
It's weird, and it seems not about secure-forward, but about network quality of your system or so...

(i used fluent 2.1.4 since 2 months without issue)
i had upgrade in 2.2.0 and some issue :/

i try reinstall in 2.1.4 ;)

same result with 2.1.4
i don't understand ... i have try with new VM and reinstall OS + fluentd and i have always this issue
forwarder can't stay connected with aggregator
(not a network problem ... same network tcpdump seems ok)

commented

I have the same issue. How do we fix this or is there an work around?

I released experimental version of this plugin, v0.3.5pre1. It may improve your problems if you're specifying keepalive timeout option.
With this version, secure-forward output plugin will disconnect stuck connections forcedly after keepalive timeout (in default, connection_hard_timeout is 1.2x of keepalive).

If you have troubles on stuck connections with keepalive, please try it, with just few nodes at first :P

Hi

I get a similar issue to this. A connection fails to be established even though the logs say the retry succeeded nothing makes it to the aggregator. Any insight as to what could be causing this issue?

2017-07-19 11:10:45 +0000 [info]: connection established to logger.contacta.io
2017-07-19 11:11:45 +0000 [warn]: disconnected host="logger.contacta.io" port=24284
2017-07-19 11:11:45 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-07-19 11:11:46 +0000 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:3faae7565348"
2017-07-19 11:11:45 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-secure-forward-0.4.5/lib/fluent/plugin/out_secure_forward.rb:259:in write_objects' 2017-07-19 11:11:45 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.35/lib/fluent/output.rb:490:in write'
2017-07-19 11:11:45 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.35/lib/fluent/buffer.rb:354:in write_chunk' 2017-07-19 11:11:45 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.35/lib/fluent/buffer.rb:333:in pop'
2017-07-19 11:11:45 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.35/lib/fluent/output.rb:342:in try_flush' 2017-07-19 11:11:45 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.35/lib/fluent/output.rb:149:in run'
2017-07-19 11:11:46 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-07-19 11:11:49 +0000 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:3faae7565348"
2017-07-19 11:11:46 +0000 [warn]: suppressed same stacktrace
2017-07-19 11:11:49 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-07-19 11:11:52 +0000 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:3faae7565348"
2017-07-19 11:11:49 +0000 [warn]: suppressed same stacktrace
2017-07-19 11:11:49 +0000 [warn]: dead connection found: logger.contacta.io, reconnecting...
2017-07-19 11:11:52 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-07-19 11:12:01 +0000 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:3faae7565348"
2017-07-19 11:11:52 +0000 [warn]: suppressed same stacktrace
2017-07-19 11:11:54 +0000 [warn]: recovered connection to dead node: logger.contacta.io
2017-07-19 11:12:01 +0000 [warn]: retry succeeded. plugin_id="object:3faae7565348"
2017-07-19 11:12:02 +0000 [warn]: disconnected host="logger.contacta.io" port=24284
2017-07-19 11:12:02 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-07-19 11:12:03 +0000 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:3faae7565348"
2017-07-19 11:12:02 +0000 [warn]: suppressed same stacktrace
2017-07-19 11:12:03 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-07-19 11:12:05 +0000 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:3faae7565348"
2017-07-19 11:12:03 +0000 [warn]: suppressed same stacktrace
2017-07-19 11:12:04 +0000 [warn]: dead connection found: logger.contacta.io, reconnecting...
2017-07-19 11:12:05 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-07-19 11:12:09 +0000 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:3faae7565348"
2017-07-19 11:12:05 +0000 [warn]: suppressed same stacktrace
2017-07-19 11:12:09 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-07-19 11:12:17 +0000 error_class="RuntimeError" error="no one nodes with valid ssl session" plugin_id="object:3faae7565348"
2017-07-19 11:12:09 +0000 [warn]: suppressed same stacktrace

  • Alex