Preston-Landers / concurrent-log-handler

fork of ConcurrentLogHandler

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

concurrent-log-handler not thread/multiprocess safe

eriktews opened this issue · comments

Hi

Unfortunately there is a bug in concurrent-log-handler which corrupts logfiles when multiple processes are writing to it at the same time that were started using for example fork or by creating new threads. Here is a demo script that will illustrate the issue:

#!/usr/bin/env python3

from logging import getLogger, INFO
from concurrent_log_handler import ConcurrentRotatingFileHandler
import os
from multiprocessing import Pool

llength = 10000


def genlog(a):
    log.info(a*llength)


fname = "mylogfile.log"

log = getLogger()
logfile = os.path.abspath(fname)
rotateHandler = ConcurrentRotatingFileHandler(logfile)
log.addHandler(rotateHandler)
log.setLevel(INFO)

with Pool(300) as p:
        p.map(genlog, ["a"]*50000, 1)

log.removeHandler(rotateHandler)
rotateHandler.close()

with open(fname, "r") as f:
    for idx, l in enumerate(f):
        clength = len(l)
        if clength != llength+1:
            print("Problem in line {}: length is {}".format(idx, clength))

Just run it, and it will produce a logfile with some empty lines and some lines with twice the length (tested on Linux) with a very high probability.

The reason for this is that the filehandle for the lockfile is created when the handler is created. Every process that is created thereafter with fork or a similar method will share the same file handle and multiple threads/processes can get the exclusive locks.

From my point of view, the best way to fix this is to create a new file handle somewhere in "emit" or a similar function and close it after the log entry is written.

Hello and thanks for the bug report. My initial assessment is that I agree with your analysis. I'll review things in more detail within the next few days. Thanks.

I also added a pull request that fixes the issue, but is not very nice.

I assume that one strategy could be to log the current pid when the lock file is opened. Then for every emit, one could check whether the pid has changed and in this case, re-open the logfile.

Hi, one more thing. When you consider improving this, then this might be helpful:

https://pypi.org/project/multiprocessing-utils/

So you can have a kind of thread-local storage in your code and check this for every log entry. When the process has forked or you are in a new thread, you need to reopen the file descriptor.

I'm not sure that it's really necessary for it to hold the lock filehandle open indefinitely. I'm guessing the speed benefit is minimal in most real world logging situations. So it could open and close the lock each time it's needed like it does with the main file. In fact I'm not even sure if a separate lock file is needed at all. I think the locking can be done on the main file. There might be some benefits to keeping the lock separate that I'm not thinking of at the moment. Maybe it could be an option. I'll do a little research on it.

I think the main file is fine as long as you don't rotate your logs. When you rotate the locks, this might cause some issues

I merged your PR #9 with some very minor changes. Thanks for your contribution, I really appreciate it. I will push out a new version soon. If you still see anything hinky let me know.