Clinical-Genomics / fusion-report

Tool for parsing outputs from fusion detection tools. Part of a nf-core/rnafusion pipeline. Checkout a live demo at https://matq007.github.io/fusion-report/example/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Download of required starts but doesn't complete

wudustan opened this issue · comments

I've been using the nf-core rnafusion workflow and the standalone fusion_report download --cosmic_usr '<username>' --cosmic_passwd '<password>' /path/to/db/ command.

In each case the download starts and then hangs around "Downloading FusionGDB2_id.xlsx" and fails to proceed from there. No error messages are returned, just a process that eventually times out if used on a HPC with time limits on processes.

Tested with version fusion-report 2.1.5

Hi @wudustan, can you check if you can download the file on the server manually?

wget https://compbio.uth.edu/FusionGDB2/tables/FusionGDB2_id.xlsx -O FusionGDB2_id.xlsx
wget https://compbio.uth.edu/FusionGDB2/tables/FusionGDB2_id.xlsx -O FusionGDB2_id.xlsx
--2022-06-09 11:05:03--  https://compbio.uth.edu/FusionGDB2/tables/FusionGDB2_id.xlsx
Resolving compbio.uth.edu (compbio.uth.edu)... 129.106.32.59
Connecting to compbio.uth.edu (compbio.uth.edu)|129.106.32.59|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4796829 (4.6M) [application/vnd.openxmlformats-officedocument.spreadsheetml.sheet]
Saving to: 'FusionGDB2_id.xlsx'

FusionGDB2_id.xlsx                                   100%[=====================================================================================================================>]   4.57M   578KB/s    in 9.3s

2022-06-09 11:05:16 (502 KB/s) - 'FusionGDB2_id.xlsx' saved [4796829/4796829]

Yep seems like it works OK. It also dowloads fine as far as I can tell during the full function call, but then doesn't really move on from there.

ls
AUTHREF.JSON.SCHEMA            CYTOCONVERTED_LOG.TXT.DATA  KBIT.TXT.DATA       MCABNORM.JSON.SCHEMA  RECABNUM.TXT.DATA                                             fusiongdb.db
AUTHREF.TXT.DATA               CYTOGEN.JSON.SCHEMA         KBREAK.JSON.SCHEMA  MCABNORM.TXT.DATA     REF.JSON.SCHEMA                                               fusiongdb2.db
CYTINV.JSON.SCHEMA             CYTOGEN.TXT.DATA            KBREAK.TXT.DATA     MCBREAK.JSON.SCHEMA   REF.TXT.DATA                                                  mitelman.db
CYTINV.TXT.DATA                CYTOVAL.JSON.SCHEMA         KCLONE.JSON.SCHEMA  MCBREAK.TXT.DATA      TCGA_ChiTaRS_combined_fusion_ORF_analyzed_gencode_h19v19.txt  mitelman_db.zip
CYTOBANDS_HG38.JSON.SCHEMA     CYTOVAL.TXT.DATA            KCLONE.TXT.DATA     MCGENE.JSON.SCHEMA    TCGA_ChiTaRS_combined_fusion_information_on_hg19.txt          uniprot_gsymbol.txt
CYTOBANDS_HG38.TXT.DATA        FusionGDB2_id.xlsx          KODER.JSON.SCHEMA   MCGENE.TXT.DATA       fgene_disease_associations.txt
CYTOCONVERTED.JSON.SCHEMA      KABNORM.JSON.SCHEMA         KODER.TXT.DATA      RECAB.JSON.SCHEMA     fusionGDB2.csv
CYTOCONVERTED.TXT.DATA         KABNORM.TXT.DATA            MBCA.JSON.SCHEMA    RECAB.TXT.DATA        fusion_ppi.txt
CYTOCONVERTED_LOG.JSON.SCHEMA  KBIT.JSON.SCHEMA            MBCA.TXT.DATA       RECABNUM.JSON.SCHEMA  fusion_uniprot_related_drugs.txt

ftp://ftp1.nci.nih.gov/pub/CGAP/mitelman.tar.gz currently 404s also

Seems like you are using older version of fusion-report @wudustan. Update to the latest one, the Mitelman database has been moved to google storage that's why you have to use the latest release.

Hi, thanks for the heads up. I've updated using python3 setup.py install

I'm now getting a new error:

(fusion_report) [user] Fusion_Report % fusion_report download --cosmic_usr "usr" --cosmic_passwd "passwd" .
Downloading resources...
Downloading mitelman_db.zip
Traceback (most recent call last):
  File "[user]miniconda3/envs/fusion_report/bin/fusion_report", line 4, in <module>
    __import__('pkg_resources').run_script('fusion-report==2.1.5', 'fusion_report')
  File "[user]miniconda3/envs/fusion_report/lib/python3.8/site-packages/pkg_resources/__init__.py", line 662, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "[user]miniconda3/envs/fusion_report/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1459, in run_script
    exec(code, namespace, namespace)
  File "[user]miniconda3/envs/fusion_report/lib/python3.8/site-packages/fusion_report-2.1.5-py3.8.egg/EGG-INFO/scripts/fusion_report", line 13, in <module>
    app.run()
  File "[user]miniconda3/envs/fusion_report/lib/python3.8/site-packages/fusion_report-2.1.5-py3.8.egg/fusion_report/app.py", line 71, in run
    Download(params)
  File "[user]miniconda3/envs/fusion_report/lib/python3.8/site-packages/fusion_report-2.1.5-py3.8.egg/fusion_report/download.py", line 22, in __init__
    self.download_all(params)
  File "[user]miniconda3/envs/fusion_report/lib/python3.8/site-packages/fusion_report-2.1.5-py3.8.egg/fusion_report/download.py", line 41, in download_all
    Net.get_fusiongdb(self, return_err)
  File "[user]miniconda3/envs/fusion_report/lib/python3.8/site-packages/fusion_report-2.1.5-py3.8.egg/fusion_report/common/net.py", line 106, in get_fusiongdb
    pool = Pool(Settings.THREAD_NUM)
  File "[user]miniconda3/envs/fusion_report/lib/python3.8/multiprocessing/context.py", line 119, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild,
  File "[user]miniconda3/envs/fusion_report/lib/python3.8/multiprocessing/pool.py", line 212, in __init__
    self._repopulate_pool()
  File "[user]miniconda3/envs/fusion_report/lib/python3.8/multiprocessing/pool.py", line 303, in _repopulate_pool
    return self._repopulate_pool_static(self._ctx, self.Process,
  File "[user]miniconda3/envs/fusion_report/lib/python3.8/multiprocessing/pool.py", line 326, in _repopulate_pool_static
    w.start()
  File "[user]miniconda3/envs/fusion_report/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "[user]miniconda3/envs/fusion_report/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "[user]miniconda3/envs/fusion_report/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "[user]miniconda3/envs/fusion_report/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "[user]miniconda3/envs/fusion_report/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "[user]miniconda3/envs/fusion_report/lib/python3.8/multiprocessing/spawn.py", line 183, in get_preparation_data
    main_mod_name = getattr(main_module.__spec__, "name", None)
AttributeError: module '__main__' has no attribute '__spec__'

Something doesn't add up here. Could you try installing the latest dev branch instead?

I've managed to get this to work on a local machine with a Docker instance instead of installing the required software from conda etc.

Anyone else having this issue on HPC, try to pull to pull the fusion-report references via a Docker on your local machine first.

OK to close issue.