rpetit3 / fastq-dl

Download FASTQ files from SRA or ENA repositories.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Perl version control

pooser opened this issue · comments

The current version of Perl associated with fastq-dl is 5.22.0 which conflicts with the current version of Bactopia's (v3.0.0) Perl version which is 5.26.0. I ran into this conflict when loading Bactopia and fasq-dl as module files. Food for thought. TIA!

Hi @pooser

Thank you for reporting! Quick question were bactopia and fastq-dl installed into the same Conda environment?

Haha I also might need to look at the dependencies again, because its a little funny that Perl is causing issues with two mostly Python based tools!

EDIT

fastq-dl --- sra-tools brings in Perl
bactopia -- v3 doesn't seem to pull in Perl

Greetings @rpetit3 !

As per the no docker/singularity install instructions for Bactopia (yeah, yeah, I know...I'm a glutton for punishment, ha!) one should install Miniforge3 and then leverage its conda environment to then install Bactopia. I did exactly that and then realized I needed your nifty tool to download thousands of sequence data sets.

To manage this environment, I utilized module files and my default was to have Miniforge3, Bactopia, and fastq-dl loaded simultaneously. Using fastq-dl to fetch the SRA data went fine however, once I pointed bactopia to the data, it crashed with Perl v5.26.0 required--this is only v5.22.0

With all three modules loaded I find which perl -> Miniforge3/envs/bactopia/envs/fastq-dl/bin/perl
With only Miniforge and Bactopia loaded I find the expected which perl -> /usr/bin/perl

Ergo, I concur that fasq-dl brings in perl while bactopia does not.

This is not a big issue of course its just that as a total bactopia newb, I though others may have run into this as well.

Ah, I see now haha you are a glutton for punishment.

A few options here.

  1. Use two separate environments
conda create -n fastq-dl -c conda-forge -c bioconda fastq-dl
conda activate fastq-dl
... download your samples ...

conda deactivate
conda create -n bactopia -c conda-forge -c bioconda bactopia
conda activate bactopia
... process your samples ...
  1. Use the --accessions parameter in Bactopia.
    With this you can provide all the Experiment accessions for your samples and Bactopia will handle the downloading (via fastq-dl). You can make use of bactopia search to help with this.

here are some links:

https://bactopia.github.io/latest/tutorial/#multiple-samples
https://bactopia.github.io/latest/beginners-guide/#accessions

Let me know if this helps, if not, please don't hesistate to let me know! haha we'll get this figured out

  1. Thank you for the suggestions. I was effectively doing the same thing by adding and removing fastq-dl relative to the respective stage of the pipeline.

  2. What I am particularly interested in is obtaining large amounts of sequence data files not to analyze them with bactopia but instead train AI models to generate synthetic sequence data specific to genus and species. To do this, I have been using bactopia search to generate the accession list which I then parse and feed to fastq-dl to fetch the data.

Does bactopia have a built in mechanism to both search and retrieve the data whilst not executing the analysis? I am open to any suggestions you might have here. FWIW I am not a biologist/bioinformatician and am instead simply treating this an advanced data processing problem.

Oh, this is very interesting.

There isn't a mechanism to directly do this, but I imagine you could indirectly do it by setting --min_basepairs or --min_reads to something unrealistically high. This would cause it to fail the gather step in Bactopia. You could test for a single accession to see if it works as expected