Unable to direct script to BUSCO analysis output

Question

Unable to direct script to BUSCO analysis output

SAMtoBAM opened this issue 2 years ago · comments

Hi there,

I am unsure of how to direct the script with the directory option to the BUSCO resuls.
I ran busco on all my genomes as below

busco -m genome -i ${genome}.fa -o busco_analyses/busco.${genome} -l eurotiales_odb10

so all the 'busco.genome' output folders are placed in the same directory but when I use this directory in your script as such

python BUSCO_phylogenomics/BUSCO_phylogenomics.py -d busco_analyses/ -o BUSCO_phylogenomics_results/ --supertree --threads 30

it doesn't find any BUSCOs and I get this:

19-09-2022 08:52:31 Starting BUSCO Phylogenomics Pipeline
Found 0 BUSCO runs:

BUSCO # Species Single Copy
19-09-2022 08:52:31 0 BUSCOs were found

19-09-2022 08:52:31 Beginning SUPERTREE Analysis

19-09-2022 08:52:31 0 BUSCOs are present and single copy in at least 4 species

What is -d actually looking for?

Thanks!

SAMtoBAM · Answer 1 · Mon Sep 19 2022 17:35:40 GMT+0800 (China Standard Time)

Ok so I seemed to have found the issue,
That the output for busco results explicitly requires the 'run_' prefix in order to be considered a busco output folder
Then on top of that the busco_sequences output folder which was placed in the 'run_eurotiales_odb10' is not picked up so needs to be moved to the primary directory for the output
So it needs:

INPUT_DIRECTORY/run_*/busco_sequences/*

instead of (where 'busco_output*' could be any prefix for the busco run output and run_eurotiales_odb10 is just due to the lineage used during my busco run)

INPUT/DIRECTORY/busco_output.*/run_eurotiales_odb10/busco_sequences/

or

INPUT/DIRECTORY/run_*/run_eurotiales_odb10/busco_sequences/

I added the option '-l eurotiales_odb10' when running the script and it didn't help pick up this internal output folder
So I just moved all the run_eurotiales_odb10 output files and folders into the main output folder for each genome and renamed all the busco output folder run_genome AND it worked.