Writing a bigwig file in python3 is extremely slow compared to python2
Young-Sook opened this issue · comments
In my script, one of the functions (mergeFiles) is to read a lot of bedgraph files and merge them into a single bigwig file, using pyBigWig. I ran exactly the same script in python 2.7 and python 3.7. It only took less than 1 hour for python 2.7, but took > 6 days for python 3.7.
Also, 15GB was enough to the function in python 2.7, but in python 3.7, the script crashes when I allocated even 250GB . When I removed the part of writing to bigwig files, I don't have any running time and memory problem.
Is there any reason why the performance of pyBigWig is so different in the two versions of python?
This is my function:
def mergeFiles(fileNames):
bw = pyBigWig.open("new.bw", "w")
bw.addHeader(bwHeader)
for i in range(len(fileNames)):
tempSignalBedName = fileNames[i]
tempFile_stream = open(tempSignalBedName)
tempFile = tempFile_stream.readlines()
for j in range(len(tempFile)):
regionChromo = temp[0]
regionStart = int(temp[1])
regionEnd = int(temp[2])
regionValue = float(temp[3])
bw.addEntries([regionChromo], [regionStart], ends=[regionEnd], values=[regionValue])
tempFile_stream.close()
bw.close()
A side note: I am using multiprocessing for the mergeFiles function.
pool = multiprocessing.Pool(numSamples)
result = pool.map_async(mergeFiles, joblist).get()
pool.close()
The memory issue was probably the same as that from #91, which should now be fixed (I'm pushing out a new version now). Can you try this again with version 0.3.17?