elixir-no-nels / rbFlow-Germline

A workflow engine with a germline calling pipeline running in a container

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reference location (-r) optimization

oskarvid opened this issue · comments

Currently supports /tsd/p172 (and /cluster/projects/p172 is also possible) which is not too bad. Ideally we would like to have /tsd/shared as the original.

Will reference data be placed on the shared disk, i.e /tsd/p172ncspmdata once read rights have been given?

I think the natural choice for reference data is to maintain it under /cluster/projects/p172 with good I/O, and little need to copy it to the local disk of the compute node. Long-term we could consider a nightly rsync cron job between /tsd/shared and /cluster/projects/p172, but let's get it working from /cluster/rpojects/p172 first.

Additional comment: The reference data is not end user specific (only end user specific data should be in /tsd/p172ncspmdata and similar mounted shared filesystems). For now we have only been testing/exploring using this path for the end-users shared filesystem, the naming convention will change, and we may have multiple such end user project paths available at the same time. I'll create a new issue for this, support for that is needed in this milestone.

Ok, can we consider this issue as closed since the reference files are there already and have been demonstrated to work too?

And I am suspicious of any attempts to inject more features into this milestone. If the pipeline can run without cron jobs that rsync the reference files, which is obvious that it can, then I don't think we should include any such features in this milestone.

I've moved the hg38 directory to /cluster/projects/p172/, I will close this issue once I've verified that it works as expected. This will wait until testing on UH-Sky has finished.

I have now run a successful test run with the reference files placed on /cluster/projects/p172/hg38 and will now close this issue.