kquick / Thespian

Python Actor concurrency library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Message going AWOL

andatt opened this issue · comments

Hi Kevin

Sorry I didn't get back the previous logging ticket yet, just haven't had time. Another issue has come up in the meantime. I am sending batches of 100 messages between actors. For some batches the first message of the batch is sent by the first actor (as confirmed by logs) but is never received by the second actor (again confirmed by logs). All the rest of the messages in the batch are sent and received successfully.

I am pretty stumped right now as I have tried to set the THESPLOG_THRESHOLD env var to "Debug" and "Info" but there is nothing in /tmp/ dir (i.e. thespian.log does not exist). I have written some simplified code to try and reproduce what happens in my main system but so far the simplified code works fine.

Again this is running inside a docker container. My simplified code is below but as I said this runs fine without dropping any messages. Anything you can suggest to help me get more information from the Thespian internal logging for example?

Thanks

Andrew

from thespian.troupe import troupe
from thespian.actors import ActorTypeDispatcher
from thespian.actors import ActorSystem
from thespian.actors import WakeupMessage
import logging


class ActorLogFilter(logging.Filter):
    def filter(self, logrecord):
        return 'actorAddress' in logrecord.__dict__


class NotActorLogFilter(logging.Filter):
    def filter(self, logrecord):
        return 'actorAddress' not in logrecord.__dict__


def log_config(log_file_path_1, log_file_path_2):
    return {
        'version': 1,
        'formatters': {
            'normal': {'format': '%(levelname)-8s %(message)s'},
            'actor': {'format': '%(levelname)-8s %(actorAddress)s => %(message)s'}},
        'filters': {'isActorLog': {'()': ActorLogFilter},
                    'notActorLog': {'()': NotActorLogFilter}},
        'handlers': {'h1': {'class': 'logging.FileHandler',
                            'filename': log_file_path_1,
                            'formatter': 'normal',
                            'filters': ['notActorLog'],
                            'level': logging.INFO},
                     'h2': {'class': 'logging.FileHandler',
                            'filename': log_file_path_2,
                            'formatter': 'actor',
                            'filters': ['isActorLog'],
                            'level': logging.INFO}, },
        'loggers': {'': {'handlers': ['h1', 'h2'], 'level': logging.DEBUG}}
    }


class PrimaryActor(ActorTypeDispatcher):

    def receiveMsg_WakeupMessage(self, msg ,sender):

        if not hasattr(self, "batch_number"):
            self.submission_actor_pool_size = 2
            self.batch_number = 0
            self.last_secondary_actor_used = -1
            self.secondary_actors = self.create_secondary_actor_pool(
                SecondaryActor,
                self.submission_actor_pool_size
            )
        next_submission_actor_to_use = self.last_secondary_actor_used + 1
        for x in range(0, 100):
            message = {"number": x, "batch": self.batch_number}
            logging.info("Sending message number {0} {1}".format(self.batch_number, x))

            self.send(self.secondary_actors[next_submission_actor_to_use], message)

        self.batch_number += 1

        if self.batch_number >= 5:
            return

        if next_submission_actor_to_use > self.submission_actor_pool_size - 1:
            next_submission_actor_to_use = -1

        self.last_submission_actor_used = next_submission_actor_to_use
        self.wakeupAfter(1)

    def create_secondary_actor_pool(self, actor_code, pool_size):
        submission_actor_pool = []
        for x in range(0, pool_size):
            submission_actor_pool.append(self.createActor(actor_code))
        return submission_actor_pool


@troupe(max_count=4000, idle_count=1)
class SecondaryActor(ActorTypeDispatcher):

    def receiveMsg_dict(self, msg, sender):

        logging.info("Received message number {0} {1}".format(msg["batch"], msg["number"]))

thespian_system = ActorSystem(
    "multiprocTCPBase",
    {},
    logDefs=log_config("bug_check_1.log", "bug_check_2.log")
)

primary_actor = thespian_system.createActor(PrimaryActor)

thespian_system.tell(primary_actor, WakeupMessage(delayPeriod=1))

Hi @andatt,

I've run your tests locally (directly, and not in a Docker container) and I'm unable to reproduce your issue.

I did need to move the "next_submission_actor_to_use" limit/reset up to between the increment and the "for x" loop, but that was the only change.

For my testing, I ensure that the log files are deleted, and that there is no actor system running. I then run your code above (with the modification described) and check the results:

$ for dir in Sending Received ; do for batch in 0 1 2 3 4 ; do \
     echo $dir $batch:::; grep "$dir message number $batch" *.log | wc; \
     done; done

All outputs indicate 100 messages were logged in each direction for each batch.

Can you confirm that you are still seeing the lost messages using the process I describe here?

Hi Kevin

I think I have got to the bottom of the problem. The message is sent and received but its not entered in the log (thereby appearing the message never arrives) because the receiveMsg_WakeupMessage method never returns.

This is can be caused in 2 different ways:

  1. a POST call inside this method (in the real code not the example above). This POST successfully connects to an API and thus not does not trigger a timeout. However the API subsequently gets overloaded and hangs leaving numerous actors waiting for a response. Thus the receiveMsg method does not return and no log entries are made. This makes it appear as if the secondary actor never received the message when in fact it did - it is just the logging has not recorded this.

  2. The other cause is where there are so many messages sent to the actor system in a short period of time that the Thespian queue limit of 950 is reached. At this point the associated actor hangs and again no logging entries are made. The logging ceases at a point far before the actual failure occurs due to reasons similar to 1) I believe.

So I have resolved the issue with the API such that it is now able to handle load commensurate with that output by the actor system.

That leaves me with three questions:

  1. What can be done to prevent the queue building up to the critical 950 level? In the example code above I am trying to spread the load across different actors but this doesnt seem to make as much difference as I thought it would. Is the queue a single 'admin' queue not related to a single actor?

  2. When the queue does build to over 950 why is the system not able to recover from that? It hangs without recovery even if nothing more is added to the queue. Is there anything that can be done to mitigate this?

  3. Is there anyway (e.g. some parameter) that will cause log lines to be written as they are invoked? Because the current way of flushing and writing them at some stage well after the log line takes places is very confusing when something goes wrong. It gives the appearance of something in Thespian being the cause when in actual fact this is not the case as is demonstrated by the API issue above.

Thanks

Andrew

Hi @andatt,

That's good news: I'm glad you found the problem!

When an actor sends a message, the self.send() places it onto an outbound queue that is maintained as part of the Actor class internals. Once the receiveMessage() returns, the Actor will process sends on that queue, as well as wait for other receives. The Actor is not multithreaded, so no actual message sending or receiving is done while it is in the receiveMessage() method, thus your blocking POST call was preventing any other activity.

The logging is done by sending a log message to the logging Actor, which is placed on that same outbound queue, and therefore the POST was blocking the send of the logging message as well. Logging is done this way for two primary reasons: (1) the python logging library does not provide synchronization to the log outputs, so if each actor was writing directly to the log file they would overwrite each other's messages and some would be lost that way, and (2) because any logging from Actors on other systems will then be collected by the central system and logged together.

There are also some reasons why the Actor is single-threaded: it generally makes things simpler operationally, and it also allows for back-propagation of pressure to prevent overloads. This works in conjunction with the outbound queueing threshold of 950 to prevent a runaway actor from overloading the entire system. Slow running Actors (e.g those doing blocking POST calls) will process their inbound sockets much more slowly, so other Actors trying to send them messages will build up those messages on their internal outbound queue. This pressure will propagate backward throughout the system, and the best way to alleviate this is to use more actors in parallel (e.g. with the troupe).

Once the 950 outbound message threshold is reached, that Actor will block on a send() call, although it will still process the outbound queue as well as receive messages. This effectively prevents the Actor from generating any new send() messages until some of the existing messages have been sent, although it also prevents the Actor from responding to any of the received messages until the outbound queue drops below the threshold, whereupon the send() call completes and the receiveMessage continues. The overall system should still be able to continue once this point has been reached but it will be much more restricted by the thresholds until messages have been processed, and it is possible that a "deadlock" could occur (two actors sending messages to each other and over the threshold).

It sounds like you were able to adjust the POST target responsiveness to help, and I would recommend increasing the actor count as another way to help with this problem.

Let me know if this helps or if you would like more information.

Regards,
Kevin

Thanks very much for the detailed response Kevin. It makes sense but I think I am seeing different behaviour with regard to the draining of the queue. Using the example code I posted above I made the following changes:

  1. increase range of for loop to 500
  2. put time.sleep(5) into the secondary actor to simulate a blocking action
  3. remove self.wakepAfter(1) in the primary actor.

I ran this code and then waited 40 minutes monitoring the log files and processes. Both the actor logs and thepsian.log are incomplete. The consistent actor log entries end at 345, after which there are only 3 more entries (366- 368). Thespian log seems to be cleared during the processing but you can see here the before and after clearing:

2019-02-07 09:24:28.316307 p21450 Warn Transmit attempt from ActorAddr-(T|:39025) to ActorAddr-(T|:37945) timed out, returning PoisonPacket
2019-02-07 09:24:28.317521 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 369}-quit_0:00:00) timed out
2019-02-07 09:24:28.318295 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 370}-quit_0:00:00) timed out
2019-02-07 09:24:28.319030 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 371}-quit_0:00:00) timed out
2019-02-07 09:24:28.319659 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 372}-quit_0:00:00) timed out
2019-02-07 09:24:28.319843 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 373}-quit_0:00:00) timed out
2019-02-07 09:24:28.320024 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 374}-quit_0:00:00) timed out
2019-02-07 09:24:28.320210 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 375}-quit_0:00:00) timed out
2019-02-07 09:24:28.320389 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 376}-quit_0:00:00) timed out
2019-02-07 09:24:28.320573 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 377}-quit_0:00:00) timed out
2019-02-07 09:24:28.320779 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 378}-quit_0:00:00) timed out
2019-02-07 09:24:28.320955 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 379}-quit_0:00:00) timed out
2019-02-07 09:24:28.321148 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 380}-quit_0:00:00) timed out
2019-02-07 09:24:28.321322 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 381}-quit_0:00:00) timed out
2019-02-07 09:24:28.321500 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 382}-quit_0:00:00) timed out
2019-02-07 09:24:28.321674 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 383}-quit_0:00:00) timed out
2019-02-07 09:24:28.321850 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 384}-quit_0:00:00) timed out
2019-02-07 09:24:28.322022 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 385}-quit_0:00:00) timed out
2019-02-07 09:24:28.322197 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 386}-quit_0:00:00) timed out
2019-02-07 09:24:28.322367 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 387}-quit_0:00:00) timed out
2019-02-07 09:24:28.322536 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 388}-quit_0:00:00) timed out
2019-02-07 09:24:28.322722 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 389}-quit_0:00:00) timed out
2019-02-07 09:24:28.322896 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 390}-quit_0:00:00) timed out
2019-02-07 09:24:28.323072 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 391}-quit_0:00:00) timed out
2019-02-07 09:24:28.323328 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 392}-quit_0:00:00) timed out
2019-02-07 09:24:28.323578 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 393}-quit_0:00:00) timed out
2019-02-07 09:24:28.323766 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 394}-quit_0:00:00) timed out
2019-02-07 09:24:28.323934 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 395}-quit_0:00:00) timed out
2019-02-07 09:24:28.324101 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 396}-quit_0:00:00) timed out
2019-02-07 09:24:28.324263 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 397}-quit_0:00:00) timed out
2019-02-07 09:24:28.324428 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 398}-quit_0:00:00) timed out
2019-02-07 09:24:28.324590 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 399}-quit_0:00:00) timed out
2019-02-07 09:24:28.324763 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 400}-quit_0:00:00) timed out
2019-02-07 09:24:28.324919 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 401}-quit_0:00:00) timed out
2019-02-07 09:24:28.325094 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 402}-quit_0:00:00) timed out
2019-02-07 09:24:28.325259 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 403}-quit_0:00:00) timed out
2019-02-07 09:24:28.325417 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 404}-quit_0:00:00) timed out
2019-02-07 09:24:28.325576 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 405}-quit_0:00:00) timed out
2019-02-07 09:24:28.325733 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 406}-quit_0:00:00) timed out
2019-02-07 09:24:28.325913 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 407}-quit_0:00:00) timed out
2019-02-07 09:24:28.326073 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 408}-quit_0:00:00) timed out
2019-02-07 09:24:28.326232 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 409}-quit_0:00:00) timed out
2019-02-07 09:24:28.326393 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 410}-quit_0:00:00) timed out
2019-02-07 09:24:28.326558 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 411}-quit_0:00:00) timed out
2019-02-07 09:24:28.326727 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 412}-quit_0:00:00) timed out
2019-02-07 09:24:28.326879 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 413}-quit_0:00:00) timed out
2019-02-07 09:24:28.327037 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 414}-quit_0:00:00) timed out
2019-02-07 09:24:28.327188 p21450 Warn TX intent ************* TransportIntent(ActorAddr-(T|:37945)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'batch': 0, 'number': 415}-quit_0:00:00) timed out
2019-02-07 09:24:34.723802 p21821 Warn Transmit attempt from ActorAddr-(T|:34185) to ActorAddr-(T|:37945) timed out, returning PoisonPacket
2019-02-07 09:24:34.744827 p21820 Warn Transmit attempt from ActorAddr-(T|:42553) to ActorAddr-(T|:37945) timed out, returning PoisonPacket
2019-02-07 09:24:34.876279 p21822 Warn Transmit attempt from ActorAddr-(T|:38799) to ActorAddr-(T|:37945) timed out, returning PoisonPacket
2019-02-07 09:24:34.979651 p21823 Warn Transmit attempt from ActorAddr-(T|:37631) to ActorAddr-(T|:37945) timed out, returning PoisonPacket
2019-02-07 09:24:35.044537 p21824 Warn Transmit attempt from ActorAddr-(T|:45737) to ActorAddr-(T|:37945) timed out, returning PoisonPacket
2019-02-07 09:24:35.097416 p21826 Warn Transmit attempt from ActorAddr-(T|:37281) to ActorAddr-(T|:37945) timed out, returning PoisonPacket
2019-02-07 09:24:35.114472 p21825 Warn Transmit attempt from ActorAddr-(T|:34583) to ActorAddr-(T|:37945) timed out, returning PoisonPacket
2019-02-07 09:24:35.207336 p21827 Warn Transmit attempt from ActorAddr-(T|:37933) to ActorAddr-(T|:37945) timed out, returning PoisonPacket
2019-02-07 09:24:35.219208 p21831 Warn Transmit attempt from ActorAddr-(T|:45499) to ActorAddr-(T|:37945) timed out, returning PoisonPacket
2019-02-07 09:24:35.250942 p21828 Warn Transmit attempt from ActorAddr-(T|:45691) to ActorAddr-(T|:37945) timed out, returning PoisonPacket
2019-02-07 09:24:35.342657 p21832 Warn Transmit attempt from ActorAddr-(T|:42201) to ActorAddr-(T|:37945) timed out, returning PoisonPacket
2019-02-07 09:24:35.374387 p21833 Warn Transmit attempt from ActorAddr-(T|:44375) to ActorAddr-(T|:37945) timed out, returning PoisonPacket
2019-02-07 09:24:35.399174 p21834 Warn Transmit attempt from ActorAddr-(T|:37951) to ActorAddr-(T|:37945) timed out, returning PoisonPacket
2019-02-07 09:24:35.431283 p21835 Warn Transmit attempt from ActorAddr-(T|:34923) to ActorAddr-(T|:37945) timed out, returning PoisonPacket
2019-02-07 09:24:35.462251 p21837 Warn Transmit attempt from ActorAddr-(T|:34035) to ActorAddr-(T|:37945) timed out, returning PoisonPacket
2019-02-07 09:24:35.465874 p21836 Warn Transmit attempt from ActorAddr-(T|:34281) to ActorAddr-(T|:37945) timed out, returning PoisonPacket
2019-02-07 09:24:35.501199 p21838 Warn Transmit attempt from ActorAddr-(T|:45757) to ActorAddr-(T|:37945) timed out, returning PoisonPacket
a@a:~/python/srframework/senseon-reasoning-framework$ cat /tmp/thespian.log 
2019-02-07 09:28:43.658240 p21451 Warn TX intent ************* TransportIntent(ActorAddr-(T|:38131)-pending-ExpiresIn_0:00:00-<class 'thespian.troupe._TroupeWork'>-_TroupeWork(from=ActorAddr-(T|:39025), msg={'batch': 0, 'number': 365})-quit_0:00:00) timed out

Pre clear the log does not seem to contain all the fail messages I would expect given the missing entries on the actor log.

A ps aux during the first 10 minutes reveals a peak number of actors of 204. This diminishes for the first 10 minutes down to 100. After 40 minutes this still remains at 100. I would expect this to return to 1 given the idle_count param.

The above gives me the impression the system or actors are hanging and not exiting gracefully but maybe that's not right?

Hi @andatt,

I believe the issues you are encountering are due to running into system limits, which are causing abnormal behavior that Thespian cannot control or readily react to. In particular, you have your @troupe max_count set to 4000, which would allow up to 4000 processes to be created. Unless your kernel configuration and process limits allow resource levels this high, the kernel will start aborting connection attempts and process creation, which is not something that is easily adapted to. You may want to check ulimit -u for the number of user processes you are allowed and ulimit -n for the number of file descriptors (the largest consumer will probably be the troupe leader, which will need 1-2 file descriptors for each troupe member), and also ulimit -m to ensure it is allowing enough space for these to run. These values might be more restricted in a Docker environment as well.

On my system, not using Docker, I am able to run the altered tests you describe with troupe max_count values of up to 300; beyond that I start running into configured limits and I start seeing the aberrant behavior you reported above. As long as I stay below 300 the program above can handle the 500 messages just fine (taking roughly 5 * ((500 / max_count) + 1) seconds to complete).

-Kevin

Hi @andatt,

I did do some additional work with your example and identified some corner cases that weren't being handled as well as they could be. I've made some changes that seem to improve the original configuration you described (4000 max_jobs); I need to run it through the rest of the test suite and if it still looks good I'll have an update for you to test out sometime later today or tomorrow.

-Kevin

Thanks Kevin that's brilliant!

ulimit -u outputs 63109
ulimit -n outputs 1024

So I just have one more question:

I changed the for loop so its now got 1000 messages. I also reduce the troupe decorator to 500 as I believe that should be within the limits you describe given the ulimit output above.

When I run the system seems to lock in a similar fashion to the earlier experiment. The actor log indicates a hang at received message 368. There only appear to be 6 actors processes spawned. I checked thespian.log and saw the queue < 950 warning. I would expect to see this but then I would expect to see the queue drain as the actors process and finish their work.

So then I waited 20 minutes and everything was still locked the same - the queue has not drained and no further work was done. There were a couple more messages in thepsian.log

2019-02-09 13:59:21.931195 p5766 Warn Entering tx-only mode to drain excessive queue (950 > 950, drain-to 780)
2019-02-09 14:04:21.183626 p5767 Warn Transmit attempt from ActorAddr-(T|:44477) to ActorAddr-(T|:39413) timed out, returning PoisonPacket
2019-02-09 14:04:21.224696 p5768 Warn Transmit attempt from ActorAddr-(T|:40833) to ActorAddr-(T|:39413) timed out, returning PoisonPacket

but the actor log showed no movement from 368. What could be causing this? It seems like a burst of messages can cause a hang - is there anything we can do to mitigate this because the troupe decorator doesn't seem to prevent this.

Thanks

Andrew

Hi @andatt,

I just released version 3.9.6 (with corresponding github source updates) that will hopefully help with the above. Can you please try this new version and let me know what issues remain?

Thanks,
Kevin

I made a few more updates, now version 3.9.7 is available.

Hi Kevin

Thanks for doing that! I just tested with 1000 messages again and it works! However when I increased the number to 5000 I get a stack overflow:

checking api log for post requests... 2977
checking api log for post requests... 3002
Fatal Python error: Cannot recover from stack overflow.

Current thread 0x00007fad4f267700 (most recent call first):
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/actors.py", line 107 in __eq__
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/addressManager.py", line 161 in compareAddressEq
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/actors.py", line 103 in __eq__
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/utilis.py", line 270 in find
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/systemCommon.py", line 75 in get_next
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/systemCommon.py", line 312 in _checkNextTransmit
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 150 in resultCallback
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 254 in completionCallback
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 262 in tx_done
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 239 in _complete_expired_intents
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 125 in _runQueued
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 312 in _schedulePreparedIntent
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 214 in scheduleTransmit
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/systemCommon.py", line 276 in _send_intent_to_transport
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/systemCommon.py", line 314 in _checkNextTransmit
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 150 in resultCallback
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 254 in completionCallback
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 262 in tx_done
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 239 in _complete_expired_intents
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 125 in _runQueued
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 312 in _schedulePreparedIntent
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 214 in scheduleTransmit
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/systemCommon.py", line 276 in _send_intent_to_transport
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/systemCommon.py", line 314 in _checkNextTransmit
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 150 in resultCallback
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 254 in completionCallback
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 262 in tx_done
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 239 in _complete_expired_intents
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 125 in _runQueued
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 312 in _schedulePreparedIntent
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 214 in scheduleTransmit
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/systemCommon.py", line 276 in _send_intent_to_transport
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/systemCommon.py", line 314 in _checkNextTransmit
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 150 in resultCallback
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 254 in completionCallback
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 262 in tx_done
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 239 in _complete_expired_intents
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 125 in _runQueued
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 312 in _schedulePreparedIntent
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 214 in scheduleTransmit
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/systemCommon.py", line 276 in _send_intent_to_transport
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/systemCommon.py", line 314 in _checkNextTransmit
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 150 in resultCallback
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 254 in completionCallback
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 262 in tx_done
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 239 in _complete_expired_intents
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 125 in _runQueued
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 312 in _schedulePreparedIntent
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 214 in scheduleTransmit
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/systemCommon.py", line 276 in _send_intent_to_transport
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/systemCommon.py", line 314 in _checkNextTransmit
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 150 in resultCallback
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 254 in completionCallback
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 262 in tx_done
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 239 in _complete_expired_intents
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 125 in _runQueued
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 312 in _schedulePreparedIntent
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 214 in scheduleTransmit
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/systemCommon.py", line 276 in _send_intent_to_transport
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/systemCommon.py", line 314 in _checkNextTransmit
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 150 in resultCallback
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 254 in completionCallback
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 262 in tx_done
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 239 in _complete_expired_intents
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 125 in _runQueued
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 312 in _schedulePreparedIntent
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 214 in scheduleTransmit
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/systemCommon.py", line 276 in _send_intent_to_transport
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/systemCommon.py", line 314 in _checkNextTransmit
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 150 in resultCallback
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 254 in completionCallback
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 262 in tx_done
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 239 in _complete_expired_intents
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 125 in _runQueued
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 312 in _schedulePreparedIntent
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 214 in scheduleTransmit
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/systemCommon.py", line 276 in _send_intent_to_transport
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/systemCommon.py", line 314 in _checkNextTransmit
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 150 in resultCallback
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 254 in completionCallback
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 262 in tx_done
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 239 in _complete_expired_intents
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 125 in _runQueued
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 312 in _schedulePreparedIntent
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 214 in scheduleTransmit
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/systemCommon.py", line 276 in _send_intent_to_transport
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/systemCommon.py", line 314 in _checkNextTransmit
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 150 in resultCallback
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 254 in completionCallback
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 262 in tx_done
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 239 in _complete_expired_intents
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 125 in _runQueued
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 312 in _schedulePreparedIntent
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 214 in scheduleTransmit
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/systemCommon.py", line 276 in _send_intent_to_transport
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/systemCommon.py", line 314 in _checkNextTransmit
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 150 in resultCallback
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 254 in completionCallback
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/__init__.py", line 262 in tx_done
  File "/home/andrew/python/srframework/srf_framework_venv/lib/python3.5/site-packages/thespian/system/transport/asyncTransportBase.py", line 239 in _complete_expired_intents

I ran this twice and got the same error both times at around 3000 messages. Any ideas?

Hi @andatt,

Thank you for confirming the previous fix and for this additional report. I can reproduce this and it looks like there are actually a couple of things involved. I won't have a fix ready for you today, but hopefully by tomorrow.

Regards,
Kevin

@andatt, there is a new commit on master (36a60e1) that should help with the above. I am still looking into other issues this test raised, and I have not yet passed the new commit through the full test regimen, but this commit should help move things in a positive direction for you and I'll have more updates in the next day or so.

-Kevin

Hi Kevin

Thanks for all you efforts doing this! I just tested again with the new commit. This time there is no stackoveflow which is great. It now gets to 3411 messages received. At this point the number of actors drops down to zero and nothing further is processed. The thespian.log is:

2019-02-13 09:25:26.820843 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.822286 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.823892 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.825466 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.826949 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.828447 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.829946 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.831510 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.833084 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.932505 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.933868 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.935456 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.937016 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.938643 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.940263 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.941787 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.943306 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.944815 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.946361 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.947949 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.949484 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.950978 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.952524 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.954070 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.955635 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.957124 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.958652 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.960188 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:26.961781 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.057200 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.058658 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.060290 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.061861 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.063460 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.065035 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.066595 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.068145 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.069707 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.071269 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.072778 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.074344 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.075910 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.077405 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.078969 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.080527 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.082082 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.083656 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.085181 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.086768 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.187084 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.188524 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.190047 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.191630 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.193129 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.194758 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.196358 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.197864 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.199486 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.201056 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.202648 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.204280 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.205833 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.207365 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.208944 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.210513 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.212058 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.213616 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.215117 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.216696 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.312980 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.314427 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.316005 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.317556 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.319057 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.320655 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.322255 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.323844 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.325416 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.326990 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.328584 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.330187 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.331736 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.333241 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.334828 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.336395 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.337872 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.339465 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.341022 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.342524 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.434640 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.436030 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.437573 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.439090 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.440737 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.442346 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.443913 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.445465 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.447094 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.448603 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.450200 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.451766 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.453373 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.454907 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.456491 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.458063 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.459667 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.461211 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.462706 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.464258 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.560421 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.561849 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.563424 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.565021 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.566639 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.568202 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.570011 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.572435 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.574451 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.576121 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.577634 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.579163 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.580725 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.582393 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.583937 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.585514 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.587071 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.588662 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.590245 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.591717 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.683539 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.684990 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.689309 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.690662 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.692801 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.694627 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.696238 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.697845 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.699454 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.700942 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.702473 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.703979 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.705469 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.707039 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.708576 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.710117 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.711691 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.713370 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.715006 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.716664 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.811820 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.813192 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.814697 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.816268 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.817815 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.819282 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.820796 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.822397 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.823997 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.825491 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.827027 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.828604 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.830160 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.831670 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.833230 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.834749 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.836242 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.837784 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.839309 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.840860 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.935796 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.937247 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.938798 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.940321 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.941865 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.943335 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.944851 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.946404 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.947913 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.949467 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.951023 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.952573 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.954204 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.955750 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.957377 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.958912 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.960462 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.961964 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.963619 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:27.965135 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.057174 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.058606 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.060208 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.061806 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.063373 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.064955 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.066532 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.068110 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.069639 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.071225 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.072815 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.074404 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.076014 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.077605 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.079175 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.080669 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.082201 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.083797 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.085327 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.086971 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.185880 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.187267 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.188818 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.190408 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.192031 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.193584 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.195135 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.196728 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.198324 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.199923 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.201484 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.203054 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.204628 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.206157 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.207685 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.209244 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.210811 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.212380 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.213886 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.215465 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.309822 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.311268 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.312792 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.314323 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.315840 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.317363 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.318964 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.320485 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.322011 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.323605 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.325178 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.326908 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.328469 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.329999 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.331583 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.333183 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.334740 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.336278 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.337808 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.339384 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.433358 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.434824 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.436530 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:28.438359 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:29.458059 p5549 Warn Transmit attempt from ActorAddr-(T|:37501) to ActorAddr-(T|:34493) timed out, returning PoisonPacket
2019-02-13 09:25:29.458361 p5549 Warn TX intent ************* TransportIntent(ActorAddr-(T|:34493)-pending-ExpiresIn_0:00:00-<class 'dict'>-{'number': 4074, 'batch': 0}-quit_0:00:00) timed out
2019-02-13 09:25:55.891576 p8657 ERR  Socket error sending to ActorAddr-(T|:34493) on <socket.socket fd=8, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('192.168.0.14', 44782)>: [Errno 110] Connection timed out / 110: ************* TransportIntent(ActorAddr-(T|:34493)-pending-ExpiresIn_0:02:49.341914-<class 'thespian.system.messages.multiproc.EndpointConnected'>-<thespian.system.messages.multiproc.EndpointConnected object at 0x7ffbb1ba52e8>-quit_0:02:49.341812)
2019-02-13 09:26:14.322706 p8834 ERR  Socket error sending to ActorAddr-(T|:34493) on <socket.socket fd=8, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('192.168.0.14', 45708)>: [Errno 110] Connection timed out / 110: ************* TransportIntent(ActorAddr-(T|:34493)-pending-ExpiresIn_0:02:50.151424-<class 'thespian.system.messages.multiproc.EndpointConnected'>-<thespian.system.messages.multiproc.EndpointConnected object at 0x7ffbb1bc69e8>-quit_0:02:50.151362)
2019-02-13 09:26:20.466253 p8891 ERR  Socket error sending to ActorAddr-(T|:34493) on <socket.socket fd=8, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('192.168.0.14', 45988)>: [Errno 110] Connection timed out / 110: ************* TransportIntent(ActorAddr-(T|:34493)-pending-ExpiresIn_0:02:50.216709-<class 'thespian.system.messages.multiproc.EndpointConnected'>-<thespian.system.messages.multiproc.EndpointConnected object at 0x7ffbb1cb8cf8>-quit_0:02:50.216680)
2019-02-13 09:26:32.755480 p8993 ERR  Socket error sending to ActorAddr-(T|:34493) on <socket.socket fd=8, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('192.168.0.14', 46484)>: [Errno 110] Connection timed out / 110: ************* TransportIntent(ActorAddr-(T|:34493)-pending-ExpiresIn_0:02:49.184457-<class 'thespian.system.messages.multiproc.EndpointConnected'>-<thespian.system.messages.multiproc.EndpointConnected object at 0x7ffbb1dafef0>-quit_0:02:49.184352)
2019-02-13 09:26:36.851727 p9031 ERR  Socket error sending to ActorAddr-(T|:34493) on <socket.socket fd=8, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('192.168.0.14', 46722)>: [Errno 110] Connection timed out / 110: ************* TransportIntent(ActorAddr-(T|:34493)-pending-ExpiresIn_0:02:50.482671-<class 'thespian.system.messages.multiproc.EndpointConnected'>-<thespian.system.messages.multiproc.EndpointConnected object at 0x7ffbb1ba6400>-quit_0:02:50.482488)

This maybe related to the ongoing issues you are already looking into though so I will await your further updates.

Thanks again!

Andrew

One additional piece of information which I forgot to mention before is that I have tested originally (pre fixes) up to 20,000 messages successfully. However the batch size in this case was only 30 (i.e. each for loop was only 30 messages, then there was a 1 seconds pause, then another 30 messages etc). So it seems to be the burst nature of messages being input that causes the issues.

Hi @andatt,

The ~3000 messages processed is not specifically what I am still working on: that is a side effect of your configuration, although it also relates to an internal Thespian timer that should be exposed as a new feature.

The fundamental limit is on the number of processes and/or file descriptors that limit the troupe size to less than the 4000 allowed by the code specification (most probably the latter). This prevents the troupe leader from creating the full 4000 processes, so it can only distribute the work to the available processes. However, there is a 5 second delay in each worker for each message, so this limits the overall throughput of processing the messages. There is an internal Thespian timeout set for all attempts to send a message, which comprise the majority of the log messages you noted above. This is intended as a failsafe to handle conditions where an Actor has not gracefully exited but is simply no longer there or no longer responding. When the timeout occurs, the message is returned to the sender in a PoisonPacket wrapper; the sender can handle these however it deems appropriate, and in your implementation the PrimaryActor doesn't provide handling for these at all so they are simply dropped.

Thus, the ~3000 messages is the number that can be sent under the available system resource limits within the timeout period for those messages. I believe that the original configuration was handling 20,000 messages because (1) you did not have the 5 second delay per message, and (2) the PrimaryActor is using a 1 second period between sending batches, as well as sending batches to different troupes, so this staggered the submission of the messages enough that the running troupes were able to process them before the send timeout.

I am considering making the Thespian internal timeout an optional argument to the self.send(), although I'm not sure if that would end up causing more confusion and failure scenarios for people. The default currently is a 5 minute time period.

Hi Kevin

Sorry for the delay coming back to you. I wanted to take some time to think about what you said and try a couple of things. Just to clarify - on the previous failing example I was using a troupe decorator with 500 count so this should be within the limits obtained from running the ulimit commands.

I think I understand what you are saying and it makes sense that we will come up against the system limits. But I am still not sure why the logging seems incomplete in thespian.log when these limits are reached. So I would expect to see:

number of failure messages in thespian.log = (5000 - number of successful messages)

5000 being the total messages sent in the initial batch.

Also the PoisonPackets that are recorded. I can find no evidence that they are returned to the Primary actor. I tried defining methods receiveMessage_PoisonMessage, receiveMessage_PoisonPacket and writing to log inside these methods. Nothing was written. To be even more sure I switched the actor to standard Actor class and logged every message coming into receiveMessage. The only message received is the initial wakeup message.

Any idea whats going on here?

Thanks

Andrew

Hi @andatt ,

I believe the PoisonPackets were actually directed at the troupe leader rather than the Primary Actor in your scenario.

It took me a little longer than planned, but I appreciate your raising this scenario and continued questioning of the behavior, and I've pushed a number of changes that should resolve the issues you are seeing, and also improve the handling of these issues in the troupe as well as the overall troupe performance. These changes have been pushed to master, and I will probably generate a new release within a couple of days, but I would appreciate it if you have time to try the latest code locally to see if this works better in your situation as well.

Regards,
Kevin

Hi @andrewsenseon,

That's good news, I'm glad everything is working well. I'll probably be generating a new release in the next day or so.

Regarding your questions:

  1. At the moment, it's not really possible to flush the log messages because all of the underlying socket transmit functionality is intentionally asynchronous. I understand the use case you are describing, and I have a couple of longer-term thoughts, but at the moment there's nothing direct. One alternative would be that your actor could directly write to its own logging file with synchronous flushing; I understand that this is clearly only a development solution and not as useful in a production environment.

  2. First let me note that the thespian.log is intended for internal thespian debugging; hopefully you should never need to refer to it as a thespian user. That said, there are some environment variables you can use to control this logfile (see thespian.system.utilis for details, but THESPLOG_FILE_MAXSIZE is the variable to control the size of the thesplog file). The default setting for this is fairly small on the assumption that a normal configuration should never need to refer to it and it should therefore have minimal impact, and that the most interesting conditions tend to occur near the end of the file.

Regards,
Kevin

Hi Kevin

Any news on the release? If it's delayed I will just go ahead and use the latest version of master branch in my requirements.txt.

Thanks

Andrew

I apologize, I did get distracted on a separate feature. I will get this released by tomorrow, Andrew, and thanks for your patience.

Regards,
Kevin

Thespian 3.9.8 has been released, thanks again @andatt!

I'm closing this issue to tie it the fixes that have been made relative to the release. I know this doesn't resolve all of your observed problems; let's open another issue to track your remaining problems, and feel free to refer back to this issue as needed.