haessar / peaks2utr

A robust Python tool for the annotation of 3’ UTRs

Home Page:https://doi.org/10.1093/bioinformatics/btad112

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Example run failure

jasonleongbio opened this issue · comments

I installed peaks2utr following the steps I reported in another issue (#1) and tried to execute the example run (using the files in the demo folder).

As I mentioned in #1, my Python version (that has been installed to the virtual environment) is

Python 3.8.15

However, the example run was not successful.
A possible reason could be that it is a newly created virtual environment which may lack some of the basic packages required to run peaks2utr.
Here is what the error messages looked like:

peaks2utr Tb927_01_v5.1.gff Tb927_01_v5.1.slice.bam
2023-01-16 16:13:56,204 - INFO - Make .log directory
2023-01-16 16:13:56,205 - INFO - Make .cache directory
2023-01-16 16:13:56,206 - INFO - Splitting forward strand from Tb927_01_v5.1.slice.bam.
2023-01-16 16:13:58,190 - INFO - Finished splitting forward strand.
2023-01-16 16:13:58,190 - INFO - Splitting reverse strand from Tb927_01_v5.1.slice.bam.
2023-01-16 16:13:58,520 - INFO - Finished splitting reverse strand.
2023-01-16 16:13:58,521 - INFO - Splitting forward-stranded BAM file into read-groups.
2023-01-16 16:14:00,566 - INFO - Splitting reverse-stranded BAM file into read-groups.
2023-01-16 16:14:00,886 - INFO - Indexing ~/demo/.cache/Tb927_01_v5.1.slice.forward_0.bam.
2023-01-16 16:14:00,915 - INFO - Indexing ~/demo/.cache/Tb927_01_v5.1.slice.forward_3.bam.
2023-01-16 16:14:00,943 - INFO - Indexing ~/demo/.cache/Tb927_01_v5.1.slice.forward_2.bam.
2023-01-16 16:14:00,971 - INFO - Indexing ~/demo/.cache/Tb927_01_v5.1.slice.forward_1.bam.
2023-01-16 16:14:01,000 - INFO - Indexing ~/demo/.cache/Tb927_01_v5.1.slice.reverse_1.bam.
2023-01-16 16:14:01,005 - INFO - Indexing ~/demo/.cache/Tb927_01_v5.1.slice.reverse_2.bam.
2023-01-16 16:14:01,010 - INFO - Indexing ~/demo/.cache/Tb927_01_v5.1.slice.reverse_0.bam.
2023-01-16 16:14:01,015 - INFO - Indexing ~/demo/.cache/Tb927_01_v5.1.slice.reverse_3.bam.
INFO     Iterating over reads to determine SPAT pileups:   0%|       | [00:00<?]
2023-01-16 16:14:01,060 - INFO - Clearing cache.
Traceback (most recent call last):
  File "/~~~~home~~~/opt/miniconda3/envs/peaks2utr_v2/bin/peaks2utr", line 8, in <module>
    sys.exit(main())
  File "/~~~~home~~~/opt/miniconda3/envs/peaks2utr_v2/lib/python3.8/site-packages/peaks2utr/__init__.py", line 49, in main
    asyncio.run(_main())
  File "/~~~~home~~~/opt/miniconda3/envs/peaks2utr_v2/lib/python3.8/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/~~~~home~~~/opt/miniconda3/envs/peaks2utr_v2/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/~~~~home~~~/opt/miniconda3/envs/peaks2utr_v2/lib/python3.8/site-packages/peaks2utr/__init__.py", line 100, in _main
    bs.pileup_soft_clipped_reads()
  File "/~~~~home~~~/opt/miniconda3/envs/peaks2utr_v2/lib/python3.8/site-packages/peaks2utr/preprocess.py", line 85, in pileup_soft_clipped_reads
    multiprocess_over_dict(self._count_unmapped_pileups, self.outputs)
  File "/~~~~home~~~/opt/miniconda3/envs/peaks2utr_v2/lib/python3.8/site-packages/peaks2utr/utils.py", line 51, in multiprocess_over_dict
    p.start()
  File "/~~~~home~~~/opt/miniconda3/envs/peaks2utr_v2/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/~~~~home~~~/opt/miniconda3/envs/peaks2utr_v2/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/~~~~home~~~/opt/miniconda3/envs/peaks2utr_v2/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/~~~~home~~~/opt/miniconda3/envs/peaks2utr_v2/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/~~~~home~~~/opt/miniconda3/envs/peaks2utr_v2/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/~~~~home~~~/opt/miniconda3/envs/peaks2utr_v2/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/~~~~home~~~/opt/miniconda3/envs/peaks2utr_v2/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_io.TextIOWrapper' object

where I have manually replaced the path of the home directory of the cloned peaks2utr folder as ~, and where my miniconda has been installed as /~~~~home~~~/.

So, as the error message indicated, it seems something's wrong when the script tried to call some function in the multiprocessing module? I'd be grateful if you could kindly point out how I could potentially solve this. Perhaps some packages are missing in this newly created virtual environment?

Thank you so much.

I wonder, are you using macOS? I've spotted the same issue presenting for another tool GoogleCloudPlatform/gsutil#961. The consensus seems to be that the combination of multiprocessing, python3.8 and macOS can lead to this (with gsutil, anyway).
If you are using macOS, can you try to run the demo again using a conda env with python 3.9 to see if the issue persists?

Another thing to try (either with python 3.8 or 3.9), as you are using a conda env, you might need to run pip install buildtools after activating it, and before running pip install peaks2utr. I believe some of the peaks2utr dependencies (such as MACS2) were failing to install due to the lack of buildtools. If this works for you, I'll add a note in the README.

Thanks @haessar

Indeed I'm using macOS. Sorry for forgetting to mention this.
May I know if preferably it should be run on a linux-based machine?

I just tried to re-install by following the additional step (pip install buildtools) before installing peaks2utr. I also replaced Python with version 3.9 (3.9.16) this time. (I'm still trying with my macOS device though.)
But sadly I still got the error message (at the same step).

The error messages are exactly the same as the previous ones.

Yes it's only been tested on Linux systems, and not on macOS (perhaps I should add that info to the README). If you need to continue using mac, can you try cloning the source code, and adding
multiprocessing.set_start_method("fork")
near the top of utils.py. Then build and install as described here https://github.com/haessar/peaks2utr#installation.

(from https://stackoverflow.com/a/73902878/6683132)

Thank you so much @haessar

I tried to run the example code on my local (macOS) device just because my own mapped BAM files have all been transferred from the Linux server to my local harddisks.

I just tried to execute the example run on my Linux server and it worked perfectly fine! So the problem on my side was really because I was using macOS instead of a Linux system.

And thanks so much for your advice on the additional multiprocessing code. I guess it would be useful to specify which line to add to the file instead of simply stating "near the top" of that file.