galaxyproject / usegalaxy-playbook

Ansible Playbook for usegalaxy.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multiple 2bit references being loaded for hg19

NickSto opened this issue · comments

There are two entries for hg19 twobit in the .loc files on main. But it's not clear where those entries are coming from.
This causes Extract Genomic DNA to fail, because it gets handed two paths joined with a comma.
- But it doesn't understand comma-delimited paths.
- Help issue caused by it: https://help.galaxyproject.org/t/extract-genomic-dna-issue/5012

The first entry is /cvmfs/data.galaxyproject.org/byhand/hg19/seq/hg19.2bit
- found in /cvmfs/data.galaxyproject.org/byhand/location/twobit.loc
- ..which is itself loaded by /srv/galaxy/main/config/tool_data_table_conf.xml
- ..which is one of the values of the tool_data_table_config_path key in config/galaxy.yml.

The second entry is /galaxy/data/hg19/seq/hg19.2bit
- this is found in /galaxy-repl/main/tool_data/twobit.loc
- which is loaded by /cvmfs/main.galaxyproject.org/config/shed_tool_data_table_conf.xml, according to the startup logs
- but /galaxy-repl/main/tool_data/twobit.loc isn't found anywhere inside that xml.

So why does Galaxy think that /galaxy-repl/main/tool_data/twobit.loc is in /cvmfs/main.galaxyproject.org/config/shed_tool_data_table_conf.xml?

Or is that incorrect and it's coming from somewhere else?

Note: A current workaround for the problem this causes in Extract Genomic DNA is to use bedtools GetFastaBed instead, as mentioned in #286.

@natefoo @jennaj

The loc files from galaxy's tool-data path are being read always as a fallback when a referenced loc file path doesn't exist (https://github.com/mvdbeek/galaxy/blob/fe96a26d616a5cbe2b753ea6b93d330d6c7f8a39/lib/galaxy/tools/data/__init__.py#L395) . Remove /galaxy-repl/data/location/twobit.loc and you should be good.

@mvdbeek So Galaxy is finding /galaxy-repl/main/tool_data/twobit.loc because it's in the tool-data directory, not because it thinks it's in /cvmfs/main.galaxyproject.org/config/shed_tool_data_table_conf.xml? I.e. my reading of that startup log is wrong?

No, you did read the log messages correctly, but what I think what is happening is that any one entry references a twobit.loc file at a path that doesn't exist. You will then hit https://github.com/mvdbeek/galaxy/blob/fe96a26d616a5cbe2b753ea6b93d330d6c7f8a39/lib/galaxy/tools/data/__init__.py#L395 and load from the tool data folder.
And that entry is /tmp/tool-data/toolshed.g2.bx.psu.edu/repos/iuc/extract_genomic_dna/5cc8e93ee98f/twobit.loc
So I'd remove that entry AND the twobit loc file in /galaxy-repl/main/tool_data. In fact probably a good idea to remove all the entries in /cvmfs/main.galaxyproject.org/config/shed_tool_data_table_conf.xml that references /tmp

The duplicate entry for hg19 should be gone now, and I also added mm7-mm10

Fixed, closing, thanks all!