Clinical-Genomics / fusion-report

Tool for parsing outputs from fusion detection tools. Part of a nf-core/rnafusion pipeline. Checkout a live demo at https://matq007.github.io/fusion-report/example/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mitelman DB no longer available?

marchoeppner opened this issue · comments

Hi,

I've been running fusion_report as part of nf-core/rnafusion and noticed that I am no longer getting any hits in the Mitelman DB after recently moving to release 2.0 of that pipeline. Curiously, the downloaded DB was only 8kb in size.

After some digging, it appears CGAP was discontinued and the Mitelman DB is no longer available from the NIH FTP. Or maybe it has moved, but I cannot find any information on that.

Just checking that this is "working as intended" or whether there is a solution to this.
/M

How to reproduce: Install fusion_report, try downloading the databases and ...well, no Mitelman DB.

Hi @marchoeppner, which version of fusion-report are you using? You have to use the latest version because as you pointed out, the Mitelman database has been moved to a different location.

Right, I am using the 2.1.5 container from Bioconda (with Singularity); and the resulting Mitelman db is 8k in size. It also fails some other downloads with.

Downloading FusionGDB2_id.xlsx
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/pandas/compat/_optional.py", line 126, in import_optional_dependency
    module = importlib.import_module(name)
  File "/usr/local/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'openpyxl'

Seems like some dependencies were not installed. Please run pip install openpyxl .

Right. Should probably be part of the conda package tho. Installing it manually solved that bit.

In any case, using the container but also Conda for release 2.1.5 , both produce:

singularity exec -B /work_ifs fusion-report_2.1.5updated.sif fusion_report download --cosmic_usr XXXXX--cosmic_passwd XXXXXX tmp
Downloading resources...
Downloading mitelman_db.zip
Downloading TCGA_ChiTaRS_combined_fusion_information_on_hg19.txt
Downloading TCGA_ChiTaRS_combined_fusion_ORF_analyzed_gencode_h19v19.txt
Downloading uniprot_gsymbol.txt
Downloading fusion_uniprot_related_drugs.txt
Downloading fusion_ppi.txt
Downloading fgene_disease_associations.txt
Downloading FusionGDB2_id.xlsx
Downloading CosmicFusionExport.tsv.gz
Downloading finished
[sukmb352@medcluster1 2.0]$ ls -lh tmp/
total 268M
-rw-r--r-- 1 sukmb352 ikmbadmins  18M Jun 30 09:08 cosmic.db
-rw-r--r-- 1 sukmb352 ikmbadmins   16 Jun 30 09:08 DB-timestamp.txt
-rw-r--r-- 1 sukmb352 ikmbadmins 2.3M Jun 30 09:08 fusiongdb2.db
-rw-r--r-- 1 sukmb352 ikmbadmins 204M Jun 30 09:08 fusiongdb.db
-rw-r--r-- 1 sukmb352 ikmbadmins 8.0K Jun 30 09:07 mitelman.db

Could you perhaps check if this is reproducible on your end and comment on whether this is working as intended or not? Specifically, mitelman.db being just 8kb in size? In earlier releases the DB was more like 60MB. Maybe something about the parsing/sql-loading is broken?

I just encountered this error using the bioconda docker install. You are limited there in that you can't just pip install to the container. Any change of appending the missing library to the Bioconda install?

I have the older version of mitelman.db and it is 60mb as you say. I agree that it would be good to understand if the 8k file is a place holder and intended.

table mbca 
CREATE TABLE "mbca" (
        "molclin" char (1) not null ,
        "refno" int not null ,
        "invno" smallint not null ,
        "morph" varchar (20) null ,
        "topo" varchar (20) null ,
        "immunology" char (1) null ,
        "genelength" smallint null ,
        "geneshort" varchar (255) null ,
        "genelong" varchar2(4000) null ,
        "karylength" smallint null ,
        "karyshort" varchar (255) null ,
        "karylong" varchar2(4000) null
)

This is the contents of the mitelman.db file. there a few extra bits, but the database is empty. It seems like something isn't being parsed correctly anymore.

Outdated, closing, feel free to reopen if necessary