Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple 2bit references being loaded for hg19 #320

Closed
NickSto opened this issue Dec 14, 2020 · 5 comments
Closed

Multiple 2bit references being loaded for hg19 #320

NickSto opened this issue Dec 14, 2020 · 5 comments

Comments

@NickSto
Copy link
Member

NickSto commented Dec 14, 2020

There are two entries for hg19 twobit in the .loc files on main. But it's not clear where those entries are coming from.
This causes Extract Genomic DNA to fail, because it gets handed two paths joined with a comma.
- But it doesn't understand comma-delimited paths.
- Help issue caused by it: https://help.galaxyproject.org/t/extract-genomic-dna-issue/5012

The first entry is /cvmfs/data.galaxyproject.org/byhand/hg19/seq/hg19.2bit
- found in /cvmfs/data.galaxyproject.org/byhand/location/twobit.loc
- ..which is itself loaded by /srv/galaxy/main/config/tool_data_table_conf.xml
- ..which is one of the values of the tool_data_table_config_path key in config/galaxy.yml.

The second entry is /galaxy/data/hg19/seq/hg19.2bit
- this is found in /galaxy-repl/main/tool_data/twobit.loc
- which is loaded by /cvmfs/main.galaxyproject.org/config/shed_tool_data_table_conf.xml, according to the startup logs
- but /galaxy-repl/main/tool_data/twobit.loc isn't found anywhere inside that xml.

So why does Galaxy think that /galaxy-repl/main/tool_data/twobit.loc is in /cvmfs/main.galaxyproject.org/config/shed_tool_data_table_conf.xml?

Or is that incorrect and it's coming from somewhere else?

Note: A current workaround for the problem this causes in Extract Genomic DNA is to use bedtools GetFastaBed instead, as mentioned in #286.

@natefoo @jennaj

@mvdbeek
Copy link
Member

mvdbeek commented Dec 15, 2020

The loc files from galaxy's tool-data path are being read always as a fallback when a referenced loc file path doesn't exist (https://github.com/mvdbeek/galaxy/blob/fe96a26d616a5cbe2b753ea6b93d330d6c7f8a39/lib/galaxy/tools/data/__init__.py#L395) . Remove /galaxy-repl/data/location/twobit.loc and you should be good.

@NickSto
Copy link
Member Author

NickSto commented Dec 15, 2020

@mvdbeek So Galaxy is finding /galaxy-repl/main/tool_data/twobit.loc because it's in the tool-data directory, not because it thinks it's in /cvmfs/main.galaxyproject.org/config/shed_tool_data_table_conf.xml? I.e. my reading of that startup log is wrong?

@mvdbeek
Copy link
Member

mvdbeek commented Dec 15, 2020

No, you did read the log messages correctly, but what I think what is happening is that any one entry references a twobit.loc file at a path that doesn't exist. You will then hit https://github.com/mvdbeek/galaxy/blob/fe96a26d616a5cbe2b753ea6b93d330d6c7f8a39/lib/galaxy/tools/data/__init__.py#L395 and load from the tool data folder.
And that entry is /tmp/tool-data/toolshed.g2.bx.psu.edu/repos/iuc/extract_genomic_dna/5cc8e93ee98f/twobit.loc
So I'd remove that entry AND the twobit loc file in /galaxy-repl/main/tool_data. In fact probably a good idea to remove all the entries in /cvmfs/main.galaxyproject.org/config/shed_tool_data_table_conf.xml that references /tmp

@davebx
Copy link
Contributor

davebx commented Dec 19, 2020

The duplicate entry for hg19 should be gone now, and I also added mm7-mm10

@jennaj
Copy link
Member

jennaj commented Jan 7, 2021

Fixed, closing, thanks all!

@jennaj jennaj closed this as completed Jan 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants