levinas / assembly

Services to assemble genomes and metagenomes with user's choice of assembly algorithm. Currently supports single microbial genome assembly using velvet and/or kiki. More to come...

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

solve socket error (pika/rabbitmq)

levinas opened this issue · comments

ERROR:pika.adapters.base_connection:Socket Error on fd 12: 104
WARNING:pika.adapters.base_connection:Socket closed when connection was open
Process [Worker 5]::
Traceback (most recent call last):
  File "/vol/kbase/runtime/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/vol/kbase/runtime/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/disks/arast/fangfang/assembly/lib/assembly/consume.py", line 376, in start
    self.fetch_job()
  File "/disks/arast/fangfang/assembly/lib/assembly/consume.py", line 350, in fetch_job
    channel.start_consuming()
  File "/vol/kbase/runtime/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 955, in start_consuming
    self.connection.process_data_events()
  File "/vol/kbase/runtime/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 240, in process_data_events
    if self._handle_read():
  File "/vol/kbase/runtime/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 348, in _handle_read
    super(BlockingConnection, self)._handle_read()
  File "/vol/kbase/runtime/lib/python2.7/site-packages/pika/adapters/base_connection.py", line 351, in _handle_read
    self._on_data_available(data)
  File "/vol/kbase/runtime/lib/python2.7/site-packages/pika/connection.py", line 1285, in _on_data_available
    self._process_frame(frame_value)
  File "/vol/kbase/runtime/lib/python2.7/site-packages/pika/connection.py", line 1365, in _process_frame
    self._deliver_frame_to_channel(frame_value)
  File "/vol/kbase/runtime/lib/python2.7/site-packages/pika/connection.py", line 976, in _deliver_frame_to_channel
    return self._channels[value.channel_number]._handle_content_frame(value)
  File "/vol/kbase/runtime/lib/python2.7/site-packages/pika/channel.py", line 792, in _handle_content_frame
    self._on_deliver(*response)
  File "/vol/kbase/runtime/lib/python2.7/site-packages/pika/channel.py", line 886, in _on_deliver
    body)
  File "/disks/arast/fangfang/assembly/lib/assembly/consume.py", line 372, in callback
    ch.basic_ack(delivery_tag=method.delivery_tag)
  File "/vol/kbase/runtime/lib/python2.7/site-packages/pika/channel.py", line 147, in basic_ack
    return self._send_method(spec.Basic.Ack(delivery_tag, multiple))
  File "/vol/kbase/runtime/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 1159, in _send_method
    self.connection.send_method(self.channel_number, method_frame, content)
  File "/vol/kbase/runtime/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 274, in send_method
    self._send_method(channel_number, method_frame, content)
  File "/vol/kbase/runtime/lib/python2.7/site-packages/pika/connection.py", line 1503, in _send_method
    self._send_frame(frame.Method(channel_number, method_frame))
  File "/vol/kbase/runtime/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 417, in _send_frame
    super(BlockingConnection, self)._send_frame(frame_value)
  File "/vol/kbase/runtime/lib/python2.7/site-packages/pika/connection.py", line 1490, in _send_frame
    self._flush_outbound()
  File "/vol/kbase/runtime/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 377, in _flush_outbound
    if self._handle_write():
  File "/vol/kbase/runtime/lib/python2.7/site-packages/pika/adapters/base_connection.py", line 365, in _handle_write
    return self._handle_error(error)
  File "/vol/kbase/runtime/lib/python2.7/site-packages/pika/adapters/base_connection.py", line 302, in _handle_error
    self._handle_disconnect()
  File "/vol/kbase/runtime/lib/python2.7/site-packages/pika/adapters/base_connection.py", line 248, in _handle_disconnect
    self._adapter_disconnect()
  File "/vol/kbase/runtime/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 318, in _adapter_disconnect
    self._check_state_on_disconnect()
  File "/vol/kbase/runtime/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 371, in _check_state_on_disconnect
    raise exceptions.ConnectionClosed()
ConnectionClosed

See discussion in pika/pika#266

gmr:

It appears that you are blocking in your consumer for longer than Pika has to respond to timeouts. The telling line is: =ERROR REPORT==== 30-Jan-2013::04:32:21 === closing AMQP connection <0.515.0> (10.169.2.187:35624 -> 10.169.2.187:5672): {heartbeat_timeout,running} You can either turn them off or make them much longer. In pika it's heartbeat_interval=0 to turn them off or heartbeat_interval={seconds} to set how many seconds you want them run. My guess is your consumer is blocking in Python processing for a fair amount of time if this is happening. ... Use BlockingConnection.sleep() instead of time.sleep().

The heartbeat_timeout error is not seen on the production control server. The most recent occurrence in the edge server was in January, 2014:

=ERROR REPORT==== 14-Jan-2014::23:44:03 ===
closing AMQP connection <0.31416.6> (10.0.28.9:56995 -> 10.0.28.15:5672):
{heartbeat_timeout,running}

RabbitMQ log on elm:

=INFO REPORT==== 30-Jul-2014::15:42:13 ===
accepting AMQP connection <0.2007.0> (127.0.0.1:45650 -> 127.0.0.1:5672)

=WARNING REPORT==== 30-Jul-2014::15:42:14 ===
closing AMQP connection <0.2007.0> (127.0.0.1:45650 -> 127.0.0.1:5672):
connection_closed_abruptly

=ERROR REPORT==== 30-Jul-2014::16:06:40 ===
closing AMQP connection <0.1872.0> (127.0.0.1:45582 -> 127.0.0.1:5672):
{heartbeat_timeout,running}

=ERROR REPORT==== 30-Jul-2014::16:06:40 ===
closing AMQP connection <0.1883.0> (127.0.0.1:45584 -> 127.0.0.1:5672):
{heartbeat_timeout,running}

=ERROR REPORT==== 30-Jul-2014::16:06:40 ===
closing AMQP connection <0.1894.0> (127.0.0.1:45586 -> 127.0.0.1:5672):
{heartbeat_timeout,running}

=ERROR REPORT==== 30-Jul-2014::16:06:40 ===
closing AMQP connection <0.1905.0> (127.0.0.1:45588 -> 127.0.0.1:5672):
{heartbeat_timeout,running}

=WARNING REPORT==== 30-Jul-2014::19:35:35 ===
closing AMQP connection <0.1861.0> (127.0.0.1:45580 -> 127.0.0.1:5672):
connection_closed_abruptly

On magellan control servers:

$ cat /etc/rabbitmq/rabbitmq.config

[
  {rabbit, [
  %% Disable heartbeat check
    {heartbeat, 0}
  ]}
].

Bob applied this setting on elm. Testing now.

$ date
Thu Jul 31 13:26:14 CDT 2014

$ ar-stat
|  331   |   209   |       Stage 4/9: spades       | 0:13:04  |    b93.slow   |
|  332   |   209   |       Stage 3/7: spades       | 0:13:04  |    b93.auto   |
|  333   |   209   |         Stage 2/3: a6         | 0:13:04  | b93.rast_fast |
|  334   |   209   |       Stage 2/4: spades       | 0:13:04  |    b93.rast   |