Unable to direct script to BUSCO analysis output
SAMtoBAM opened this issue · comments
Hi there,
I am unsure of how to direct the script with the directory option to the BUSCO resuls.
I ran busco on all my genomes as below
busco -m genome -i ${genome}.fa -o busco_analyses/busco.${genome} -l eurotiales_odb10
so all the 'busco.genome' output folders are placed in the same directory but when I use this directory in your script as such
python BUSCO_phylogenomics/BUSCO_phylogenomics.py -d busco_analyses/ -o BUSCO_phylogenomics_results/ --supertree --threads 30
it doesn't find any BUSCOs and I get this:
19-09-2022 08:52:31 Starting BUSCO Phylogenomics Pipeline
Found 0 BUSCO runs:BUSCO # Species Single Copy
19-09-2022 08:52:31 0 BUSCOs were found19-09-2022 08:52:31 Beginning SUPERTREE Analysis
19-09-2022 08:52:31 0 BUSCOs are present and single copy in at least 4 species
What is -d actually looking for?
Thanks!
Ok so I seemed to have found the issue,
That the output for busco results explicitly requires the 'run_' prefix in order to be considered a busco output folder
Then on top of that the busco_sequences output folder which was placed in the 'run_eurotiales_odb10' is not picked up so needs to be moved to the primary directory for the output
So it needs:
INPUT_DIRECTORY/run_*/busco_sequences/*
instead of (where 'busco_output*' could be any prefix for the busco run output and run_eurotiales_odb10 is just due to the lineage used during my busco run)
INPUT/DIRECTORY/busco_output.*/run_eurotiales_odb10/busco_sequences/
or
INPUT/DIRECTORY/run_*/run_eurotiales_odb10/busco_sequences/
I added the option '-l eurotiales_odb10' when running the script and it didn't help pick up this internal output folder
So I just moved all the run_eurotiales_odb10 output files and folders into the main output folder for each genome and renamed all the busco output folder run_genome AND it worked.