idle_check seems to run into a dead loop without any exceptions

Question

idle_check seems to run into a dead loop without any exceptions

WilliamChen-luckbob opened this issue 9 months ago · comments

WilliamChen-luckbob commented 9 months ago

Here is the code:

# coding:utf-8
import threading
import traceback

from imapclient import IMAPClient

from common.logger import logger, notifiable_logger

IDLE_WAITE_TIME_SEC = 10


class ProcessThread(threading.Thread):
    def __init__(self, *args, **kwargs):
        super().__init__(
            group=kwargs.get("group"),
            target=kwargs.get("target"),
            name=kwargs.get("name"),
            args=kwargs.get("args"),
            kwargs=kwargs.get("kwargs"),
            daemon=kwargs.get("daemon")
        )
        self.should_stop = threading.Event()
        self.client: IMAPClient = kwargs.get("client")
        print(f"thread {self.name} init OK!")
        self.DISTRIBUTE_LOCK_ENABLED = False

    def run(self):
        global IDLE_WAITE_TIME_SEC
        self.client.select_folder('INBOX')

        # open IDLE mode
        self.client.idle()
        try:
            while self.should_stop.is_set() is False:
                # wait for up to IDLE_WAITE_TIME_SEC seconds for an IDLE response
                responses = self.client.idle_check(timeout=IDLE_WAITE_TIME_SEC)

                if responses:
                    logger.info(f"{self.name} got response：{responses}")
                    # terminate IDLE mode
                    self.client.idle_done()
                    # analyze the responses
                    for response in responses:
                        if response[1] == b'EXISTS':
                            logger.info(f"{self.name} there is new email!")
                            uid_list = self.client.search(['UNSEEN'])

                            logger.info(f"{self.name} got new emails！uid_list={uid_list}")
                        else:
                            logger.info(f"{self.name} got other emails！")
                    # restart IDLE mode
                    self.client.idle()
                else:
                    logger.debug(f"{self.name} no response！goto next loop！")
        except Exception as e:
            traceback.print_exc()
            notifiable_logger.error(f'thread {self.name} has error！{e},trace：{traceback.format_exc()}')

    def stop(self):
        logger.info(f"{self.name} should stop gracefully！")
        self.should_stop.set()


if __name__ == '__main__':
    client = IMAPClient(host="xxxxxxxx")
    client.login("xxxxxxxxxxx", "xxxxxxxxxxx")
    listener = ProcessThread(
        name='test listener',
        client=client,
    )
    stop_event = threading.Event()
    try:
        listener.start()
        while stop_event.is_set() is False:
            logger.info("main thread is running！")
            stop_event.wait(timeout=10)
    except KeyboardInterrupt as e:
        logger.info("main thread got KeyboardInterrupt！")
        listener.stop()
        listener.join()
        stop_event.set()

After starting this piece of code, it can correctly retrieve data from the email server. As I expected, the program will continuously loop and wait for the server's response and fetch the content of unread emails from the inbox.

However, after running the code for a day or two, a strange phenomenon occurs:

test listener no response! goto next loop! continues to be printed continuously. Despite my attempts to send emails to the currently monitored mailbox, I cannot retrieve a response from the server. In other words, idle_check fails under certain circumstances without throwing an exception. Upon restarting, it becomes possible to monitor the email server's responses again.

So, could this be a bug or some operational mechanism related to sockets? I have limited knowledge about communication principles.

Therefore, upon encountering this issue, my current solution involves setting up an additional thread for actively scanning unread emails to perform regular queries. If there's a mismatch between the UID and the content being listened to during the long polling or if I detect the failure of the long polling, I need to reinitialize a client and restart this thread to continue the monitoring process.

Menno Finlay-Smits · Answer 1 · Fri Dec 01 2023 01:32:34 GMT+0800 (China Standard Time)

A server is allowed to drop a client that has been IDLE for more than 30 minutes without doing anything. Although your code is regularly timing out from the idle_check, it's only restarting IDLE mode if there was a server response. For quiet mailboxes the server could just be dropping the connection after some time because it hasn't seen anything from the client to indicate that it's still alive.

Search for 29 minutes here to see the wording in the spec: https://www.rfc-editor.org/rfc/rfc2177

In your example, I would recommend making IDLE_WAITE_TIME_SEC longer (several minutes?) since the code doesn't need to do anything else while waiting anyway, and always stopping and restarting idle (idle_done and then idle) after each timeout.

Another potential problem with your code is that you're treating the numbers that come with UNSEEN response as message UIDs. I fairly sure that the number returned here is a message count, not a message id. You might want to check that. In my experience it's best to use the IDLE feature to determine only that something has happened and then use other techniques to determine what changed.