marbl / parsnp

Parsnp was designed to align the core genome of hundreds to thousands of bacterial genomes within a few minutes to few hours. Input can be both draft assemblies and finished genomes, and output includes variant (SNP) calls, core genome phylogeny and multi-alignments. Parsnp leverages contextual information provided by multi-alignments surrounding SNP sites for filtration/cleaning, in addition to existing tools for recombination detection/filtration and phylogenetic reconstruction.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Remove +/-30% filter

jellila opened this issue · comments

commented

Hi,

With the flag -c parsnp does not include genomes that are very different in size. Does anyone know how to make parsnp recruit every genome in my folder despite the +/-30% difference in size? I tried what is suggested in question #10 but it didn't work.

Thank you.

Laura

Laura,

I have updated the —curated flag in Parsnp v1.5.2 to now include all input genomes regardless of size. It will still warn you about genomes which would have otherwise been discarded by the size filter, though.

Let me know if this version works for you (it is available on conda as well)

Best,

Bryce

commented

Thank you very much for your help, first of all!

I get this when I run the new parsnp:

`**************************************************************
SETTINGS:
|-refgenome: /Users/laura/Desktop/BIOINFORMATICS/P.profundum.fasta
|-genomes:
/Users/laura/Desktop/BIOINFORMATICS/Pdd:Pdp/MT1415.fasta
/Users/laura/Desktop/BIOINFORMATICS/Pdd:Pdp/KC-Na-NB1.fasta
...45 more file(s)...
/Users/laura/Desktop/BIOINFORMATICS/Pdd:Pdp/89dp-OG16.fasta
/Users/laura/Desktop/BIOINFORMATICS/Pdd:Pdp/QMA0506.fasta
|-aligner: muscle
|-outdir: /Users/laura/Desktop/BIOINFORMATICS/NEWPARSNP
|-OS: Darwin
|-threads: 1


17:15:46 - INFO - <>
17:15:46 - INFO - No genbank file provided for reference annotations, skipping..
17:15:46 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd:Pdp/MT1415.fasta is 1.67x shorter than reference genome!
17:15:46 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd:Pdp/KC-Na-NB1.fasta is 1.42x shorter than reference genome!
17:15:46 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd:Pdp/A-162.fasta is 1.49x shorter than reference genome!
17:15:46 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd:Pdp/JM-2017.fasta is 1.45x shorter than reference genome!
Traceback (most recent call last):
File "/Users/laura/miniconda3/envs/htools/bin/parsnp", line 816, in
hdr = ff.readline()
File "/Users/laura/miniconda3/envs/htools/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 334: invalid start byte`

I guess I am doing something wrong now.

Thank you and best regards,

Laura

Laura,

Happy to help! Thanks for downloading the new version. Can you show me the full terminal? i.e. command used and all output?

My suspicion is that there is a pesky .DS_Store file causing this headache. Passing in the input genomes via -d /Users/laura/Desktop/BIOINFORMATICS/Pdd:Pdp will use all files in the directory, regardless of name or extension. This is why in Parsnp v1.5 we've added support for list/regex input, so that you can specify precisely which files to pass. In your case you can pass -d /Users/laura/Desktop/BIOINFORMATICS/Pdd:Pdp/*.fasta to only use the fasta files from that directory.

Best,

Bryce

commented

Sure, here is the code I tried the first time with the output:

`(base) laura@d-i184-58-74 ~ % conda activate htools
(htools) laura@d-i184-58-74 ~ % /Users/laura/miniconda3/envs/htools/bin/parsnp -r /Users/laura/Desktop/BIOINFORMATICS/P.profundum.fasta -d /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp -o /Users/laura/Desktop/BIOINFORMATICS/Newparsnp -c
|--Parsnp 1.5.1--|
For detailed documentation please see --> http://harvest.readthedocs.org/en/latest
08:37:41 - INFO -


SETTINGS:
|-refgenome: /Users/laura/Desktop/BIOINFORMATICS/P.profundum.fasta
|-genomes:
/Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/MT1415.fasta
/Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/KC-Na-NB1.fasta
...45 more file(s)...
/Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/89dp-OG16.fasta
/Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/QMA0506.fasta
|-aligner: muscle
|-outdir: /Users/laura/Desktop/BIOINFORMATICS/Newparsnp
|-OS: Darwin
|-threads: 1


08:37:41 - INFO - <>
08:37:41 - INFO - No genbank file provided for reference annotations, skipping..
08:37:41 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/MT1415.fasta is 1.67x shorter than reference genome!
08:37:41 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/KC-Na-NB1.fasta is 1.42x shorter than reference genome!
08:37:41 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/A-162.fasta is 1.49x shorter than reference genome!
08:37:41 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/JM-2017.fasta is 1.45x shorter than reference genome!
Traceback (most recent call last):
File "/Users/laura/miniconda3/envs/htools/bin/parsnp", line 816, in
hdr = ff.readline()
File "/Users/laura/miniconda3/envs/htools/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 334: invalid start byte`

After your comment about the extension of the files, I check my directory and the files are all fasta. However, I tried as you suggested:

`(htools) laura@d-i184-58-74 ~ % /Users/laura/miniconda3/envs/htools/bin/parsnp -r /Users/laura/Desktop/BIOINFORMATICS/P.profundum.fasta -d /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/*.fasta -o /Users/laura/Desktop/BIOINFORMATICS/Newparsnp2 -c
|--Parsnp 1.5.1--|
For detailed documentation please see --> http://harvest.readthedocs.org/en/latest
08:50:12 - INFO -


SETTINGS:
|-refgenome: /Users/laura/Desktop/BIOINFORMATICS/P.profundum.fasta
|-genomes:
/Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/111bp-OG15A.fasta
/Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/144bp-OG3.fasta
...44 more file(s)...
/Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/RM-71.fasta
/Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/SNW-8.1.fasta
|-aligner: muscle
|-outdir: /Users/laura/Desktop/BIOINFORMATICS/Newparsnp2
|-OS: Darwin
|-threads: 1


08:50:12 - INFO - <>
08:50:12 - INFO - No genbank file provided for reference annotations, skipping..
08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/2012V-1072.fasta is 1.45x shorter than reference genome!
08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/206328-2.fasta is 1.43x shorter than reference genome!
08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/64bp-OG9.fasta is 1.42x shorter than reference genome!
08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/80077637.fasta is 1.44x shorter than reference genome!
08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/89dp-OG16.fasta is 1.41x shorter than reference genome!
08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/9046-81.fasta is 1.44x shorter than reference genome!
08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/91-197.fasta is 1.51x shorter than reference genome!
08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/940804-1-1.fasta is 1.46x shorter than reference genome!
08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/940804-1-2.fasta is 1.42x shorter than reference genome!
08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/A-162.fasta is 1.49x shorter than reference genome!
08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/ATCC29688.fasta is 1.67x shorter than reference genome!
08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/ATCC29689.fasta is 1.73x shorter than reference genome!
08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/BT-6.fasta is 1.43x shorter than reference genome!
08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/GCSL-P85-BT-6.fasta is 1.43x shorter than reference genome!
08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/Hep-2a-11.fasta is 1.52x shorter than reference genome!
08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/Hep-2a-14.fasta is 1.46x shorter than reference genome!
08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/Hep-2a-16.fasta is 1.46x shorter than reference genome!
08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/Hep-2b-22.fasta is 1.43x shorter than reference genome!
08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/JM-2017.fasta is 1.45x shorter than reference genome!
08:50:12 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/KC-Na-1.fasta is 1.41x shorter than reference genome!
08:50:13 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/KC-Na-NB1.fasta is 1.42x shorter than reference genome!
08:50:13 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/L091106-03H.fasta is 1.48x shorter than reference genome!
08:50:13 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/LD-07.fasta is 1.48x shorter than reference genome!
08:50:13 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/MT1415.fasta is 1.67x shorter than reference genome!
08:50:13 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/OT-51443.fasta is 1.41x shorter than reference genome!
08:50:13 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/PP3.fasta is 1.53x shorter than reference genome!
08:50:13 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/RM-71.fasta is 1.42x shorter than reference genome!
08:50:13 - WARNING - File /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/SNW-8.1.fasta is 1.56x shorter than reference genome!
08:50:13 - INFO - Running Parsnp multi-MUM search and libMUSCLE aligner...
08:54:45 - CRITICAL - The following command failed:
>>$ /Users/laura/miniconda3/envs/htools/bin/bin/parsnp_core /Users/laura/Desktop/BIOINFORMATICS/Newparsnp2/parsnpAligner.ini
Please veryify input data and restart Parsnp.
If the problem persists please contact the Parsnp development team.

  STDOUT:
  0

P.profundum.fasta.ref,Len:6403280,GC:41.734
111bp-OG15A.fasta,Len:4633796,GC:40.5543
144bp-OG3.fasta,Len:4814661,GC:40.5027
164dp-OG2.fasta,Len:4626672,GC:40.5876
2012V-1072.fasta,Len:4408971,GC:40.8869
206317-1.fasta,Len:4673986,GC:40.4714
206328-2.fasta,Len:4515945,GC:40.5071
64bp-OG9.fasta,Len:4533007,GC:40.8729
70dps-OG12.fasta,Len:4827935,GC:40.4916
80077637.fasta,Len:4468636,GC:40.4223
89dp-OG16.fasta,Len:4583837,GC:40.9198
9046-81.fasta,Len:4452773,GC:40.9153
91-197.fasta,Len:4227017,GC:41.0188
940804-1-1.fasta,Len:4395972,GC:40.5314
940804-1-2.fasta,Len:4532473,GC:40.5118
A-162.fasta,Len:4335616,GC:40.7647
ATCC29688.fasta,Len:3932528,GC:40.8521
ATCC29689.fasta,Len:3796820,GC:40.9169
ATCC33539.fasta,Len:4953976,GC:40.501
BT-6.fasta,Len:4480819,GC:40.438
CDC-2227-81.fasta,Len:4719947,GC:40.459
CIP102761.fasta,Len:5048498,GC:40.6716
DI21.fasta,Len:4787052,GC:40.8234
GCSL-P85-BT-6.fasta,Len:4478193,GC:40.4111
Hep-2a-11.fasta,Len:4249732,GC:40.8167
Hep-2a-14.fasta,Len:4398187,GC:40.825
Hep-2a-16.fasta,Len:4392725,GC:40.8092
Hep-2b-22.fasta,Len:4518266,GC:40.9055
JM-2017.fasta,Len:4436399,GC:40.6322
KC-Na-1.fasta,Len:4546136,GC:40.9116
KC-Na-NB1.fasta,Len:4524406,GC:40.9226
L091106-03H.fasta,Len:4329488,GC:40.8722
LD-07.fasta,Len:4344639,GC:40.6153
MT1415.fasta,Len:3921871,GC:40.8083
NCTC11646.fasta,Len:4661852,GC:40.7215
NCTC11647.fasta,Len:5061854,GC:40.8
NCTC11648.fasta,Len:4611119,GC:40.6921
OT-51443.fasta,Len:4549111,GC:41.3788
PP3.fasta,Len:4281249,GC:40.7807
Phdp_Wu-1.fasta,Len:4586084,GC:40.807
QMA0365.fasta,Len:4755557,GC:40.6034
QMA0505.fasta,Len:4677363,GC:40.8864
QMA0506.fasta,Len:4651953,GC:40.904
QMA0509.fasta,Len:4632979,GC:40.6029
QMA0510.fasta,Len:4791745,GC:40.6392
QMA0511.fasta,Len:4664845,GC:40.5803
QMA0512.fasta,Len:4610033,GC:40.7846
RM-71.fasta,Len:4524115,GC:40.5864
SNW-8.1.fasta,Len:4240149,GC:40.827
Finished processing input sequences, elapsed time: 6 seconds

             compressed suffix graph construction elapsed time: 5 seconds

             MUM anchor search elapsed time: 208 seconds


  
  STDERR:

parsnpAligner:: rapid whole genome SNP typing


ParSNP: Preparing to construct global multiple alignment framework

Preparing to verify and process input sequences...
Searching for initial MUM anchors...

    Constructing compressed suffix graph...
    Performing initial search for exact matches in the sequences...

Performing recursive MUM search between MUM anchors...
`

In the output folder I can find these files:
tmp (empty folder)
psnn.ini
parsnpAligner.log
parsnpAligner.ini
P.profundum.fasta.ref
blocks (empty folder)
all_mumi.ini

Thank you again!

Best,

Laura

Thanks for pasting the output! Would you mind showing me the contents of parsnpAligner.log?

commented

It's empty, completely empty! I don't really understand 😩

Hmm, that is rather strange... the functionality of the core parsnp aligner hasn't changed for some time, so my guess is that it is failing due to the genomes all being too disparate from the reference.

You're more than welcome to attach the input files here and I can go through and debug it.

commented

Archive.zip
Thank you very much! These are only a few of the genomes (the file would be too big with all of them).
I may try to change the reference genome (P. profundum), but still would like to know where the problem is!

Laura

Hmm... so with only those sequences, I am able to run w/out any issues

(base) blk6@sno:~/Projects/HarvestSuite/parsnp$ ./parsnp -r issue80/P.profundum.fasta -d issue80/genomes/*.fasta -o issue80_out -c
|--Parsnp 1.5.2--|
For detailed documentation please see --> http://harvest.readthedocs.org/en/latest
13:26:29 - INFO - 
****************************
SETTINGS:
|-refgenome:	issue80/P.profundum.fasta
|-genomes:	
	issue80/genomes/206328-2.fasta
	issue80/genomes/505.fasta
	...2 more file(s)...
	issue80/genomes/Phdp_Wu-1.fasta
	issue80/genomes/SNW-8.1.fasta
|-aligner:	muscle
|-outdir:	issue80_out
|-OS:	Linux
|-threads:	1
****************************
    
13:26:29 - INFO - <<Parsnp started>>
13:26:29 - INFO - No genbank file provided for reference annotations, skipping..
13:26:29 - WARNING - File issue80/genomes/206328-2.fasta is 1.41x shorter than reference genome! 
13:26:29 - WARNING - File issue80/genomes/64bp-OG9.fasta is 1.40x shorter than reference genome! 
13:26:29 - WARNING - File issue80/genomes/91-197.fasta is 1.50x shorter than reference genome! 
13:26:29 - WARNING - File issue80/genomes/SNW-8.1.fasta is 1.54x shorter than reference genome! 
13:26:29 - INFO - Running Parsnp multi-MUM search and libMUSCLE aligner...
13:27:19 - INFO - Reconstructing core genome phylogeny...
13:27:19 - INFO - Aligned 7 genomes in 47.54 seconds
13:27:19 - INFO - Parsnp finished! All output available in issue80_out

My email is brycekille@gmail.com if you want to send me the full dataset.

Best,

Bryce

Hi Laura,

So I was able to run parsnp on my Ubuntu machine w/ your files.

parsnp -r issue80/P.profundum.fasta -d $HOME/Data/parsnpissue80/*.fasta -o issue80_out -c --threads 10
|--Parsnp 1.5.1--|
For detailed documentation please see --> http://harvest.readthedocs.org/en/latest
17:20:25 - INFO - 
****************************
SETTINGS:
|-refgenome:	issue80/P.profundum.fasta
|-genomes:	
	/home/Users/blk6/Data/parsnpissue80/111bp-OG15A.fasta
	/home/Users/blk6/Data/parsnpissue80/144bp-OG3.fasta
	...45 more file(s)...
	/home/Users/blk6/Data/parsnpissue80/RM-71.fasta
	/home/Users/blk6/Data/parsnpissue80/SNW-8.1.fasta
|-aligner:	muscle
|-outdir:	issue80_out
|-OS:	Linux
|-threads:	10
****************************
    
17:20:25 - INFO - <<Parsnp started>>
17:20:25 - INFO - No genbank file provided for reference annotations, skipping..
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/2012V-1072.fasta is 1.43x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/206328-2.fasta is 1.41x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/64bp-OG9.fasta is 1.40x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/80077637.fasta is 1.42x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/9046-81.fasta is 1.42x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/91-197.fasta is 1.50x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/940804-1-1.fasta is 1.45x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/940804-1-2.fasta is 1.40x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/A-162.fasta is 1.47x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/ATCC29688.fasta is 1.65x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/ATCC29689.fasta is 1.71x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/BT-6.fasta is 1.41x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/GCSL-P85-BT-6.fasta is 1.42x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/Hep-2a-11.fasta is 1.50x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/Hep-2a-14.fasta is 1.44x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/Hep-2a-16.fasta is 1.45x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/Hep-2b-22.fasta is 1.41x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/JM-2017.fasta is 1.43x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/L091106-03H.fasta is 1.46x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/LD-07.fasta is 1.46x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/MT1415.fasta is 1.65x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/PP3.fasta is 1.51x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/RM-71.fasta is 1.41x shorter than reference genome! 
17:20:26 - WARNING - File /home/Users/blk6/Data/parsnpissue80/SNW-8.1.fasta is 1.54x shorter than reference genome! 
17:20:26 - INFO - Running Parsnp multi-MUM search and libMUSCLE aligner...
17:24:36 - INFO - Reconstructing core genome phylogeny...
17:24:41 - INFO - Aligned 50 genomes in 4.16 minutes
17:24:41 - INFO - Parsnp finished! All output available in issue80_out

issue80_out.zip

Not sure what the issue is exactly but we can try and figure it out. How did you install parsnp and what version of MacOS are you running? I will also have a colleague try these files out on a Mac and get back to you.

Best,

Bryce

commented

Hi Jelila,

Unfortunately I am currently unable to replicate this behavior. My next suggestion would be to try building the software from source. Let me know if this works for you!

Best,

Bryce

commented

@jellila No worries 🙂 The instructions from building from source are in the README. You will need an openMP compatible compiler installed.

I will be working on making a binary that works for you this week, though, in case you have trouble building from source. I will update you at the end of the week.

commented

Laura,

I've built a new release for parsnp, available here. It should also be available on bioconda sometime this week (whenever they accept the PR).

Let me know if the issue persists with that build.

Best,

Bryce

@jellila

The new binary is available on conda. Running conda update parsnp should give you the new version.

commented
commented

@jellila I believe the issue here is the .DS_Store file in your directory. You can get around including this file by using

-d /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/*.fasta

instead of

-d /Users/laura/Desktop/BIOINFORMATICS/Pdd-Pdp/
commented

I have a similar problem and I don't konw how to solve it ,sos!!!!!
`(parsnp) [wanglamei@login02 ~]$ parsnp -r /share/home/wanglamei/Data/Bacillus/40species/GCA_000832605_ref.fasta -d /share/home/wanglamei/Data/Bacillus/40species/*.fasta -o /share/home/wanglamei/ParSNP/40species/condaV2 -F
16:44:04 - INFO - |--Parsnp 2.0.5--|

16:44:04 - WARNING - Output directory /share/home/wanglamei/ParSNP/40species/condaV2 exists, all results will be overwritten
16:44:04 - INFO -


SETTINGS:
|-refgenome: /share/home/wanglamei/Data/Bacillus/40species/GCA_000832605_ref.fasta
|-genomes:
/share/home/wanglamei/Data/Bacillus/40species/GCA_000262045.1_KCTC_13613_01_genomic.fasta
/share/home/wanglamei/Data/Bacillus/40species/GCA_000712595.1_ASM71259v1_genomic.fasta
...36 more file(s)...
/share/home/wanglamei/Data/Bacillus/40species/GCA_037907705.1_ASM3790770v1_genomic.fasta
/share/home/wanglamei/Data/Bacillus/40species/GCA_900177005.1_Bcereus.16-00174_genomic.fasta
|-aligner: muscle
|-outdir: /share/home/wanglamei/ParSNP/40species/condaV2
|-OS: Linux
|-threads: 1


16:44:04 - INFO - <>
16:44:04 - INFO - No genbank file provided for reference annotations, skipping..
16:44:04 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_000262045.1_KCTC_13613_01_genomic.fasta is 1.49x shorter than reference genome! Skipping...
16:44:04 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_000712595.1_ASM71259v1_genomic.fasta is 1.21x shorter than reference genome! Skipping...
16:44:04 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_000769555.1_ASM76955v1_genomic.fasta is 1.40x shorter than reference genome! Skipping...
16:44:04 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_001278705.1_ASM127870v1_genomic.fasta is 1.23x shorter than reference genome! Skipping...
16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_001307105.1_ASM130710v1_genomic.fasta is 1.54x shorter than reference genome! Skipping...
16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_001517105.1_ASM151710v1_genomic.fasta is 1.38x shorter than reference genome! Skipping...
16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_001584325.1_ASM158432v1_genomic.fasta is 1.53x shorter than reference genome! Skipping...
16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_001857925.1_ASM185792v1_genomic.fasta is 1.55x shorter than reference genome! Skipping...
16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_002250945.2_ASM225094v2_genomic.fasta is 1.35x shorter than reference genome! Skipping...
16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_002993925.1_ASM299392v1_genomic.fasta is 1.29x shorter than reference genome! Skipping...
16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_003148415.1_ASM314841v1_genomic.fasta is 1.30x shorter than reference genome! Skipping...
16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_004124315.2_ASM412431v2_genomic.fasta is 1.37x shorter than reference genome! Skipping...
16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_006094475.1_ASM609447v1_genomic.fasta is 1.39x shorter than reference genome! Skipping...
16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_008244765.1_ASM824476v1_genomic.fasta is 1.49x shorter than reference genome! Skipping...
16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_014042035.1_ASM1404203v1_genomic.fasta is 1.33x shorter than reference genome! Skipping...
16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_017832095.1_ASM1783209v1_genomic.fasta is 1.23x shorter than reference genome! Skipping...
16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_029772985.1_ASM2977298v1_genomic.fasta is 1.32x shorter than reference genome! Skipping...
16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_030908325.1_ASM3090832v1_genomic.fasta is 1.25x shorter than reference genome! Skipping...
16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_031195075.1_ASM3119507v1_genomic.fasta is 1.54x shorter than reference genome! Skipping...
16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_031316515.1_ASM3131651v1_genomic.fasta is 1.52x shorter than reference genome! Skipping...
16:44:05 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_031317525.1_ASM3131752v1_genomic.fasta is 1.40x shorter than reference genome! Skipping...
16:44:05 - INFO - Recruiting genomes...
/share/home/wanglamei/ParSNP/40species/condaV216:45:12 - INFO - Too few genomes to run partitions of size >50. Running all genomes at once.
16:45:12 - INFO - Running Parsnp multi-MUM search and libMUSCLE aligner...
^[[B^[[B^[[B^[[B
^C16:46:36 - CRITICAL - Caught request to terminate by user (CTRL+C), exiting now, bye
(parsnp) [wanglamei@login02 ~]$ parsnp -r /project/TBAMR03/wanglm/Bacillus/40species/GCA_031316625.1_ASM3131662v1_genomic.fna^C
(parsnp) [wanglamei@login02 ~]$ parsnp -r /share/home/wanglamei/Data/Bacillus/40species/GCA_031316625.1_ASM3131662v1_genomic.fasta ^C
(parsnp) [wanglamei@login02 ~]$ ^C
(parsnp) [wanglamei@login02 ~]$ cd /share/home/wanglamei/ParSNP/40species/condaV2
(parsnp) [wanglamei@login02 condaV2]$ /share/home/wanglamei/ParSNP/40species/condaV2^C
(parsnp) [wanglamei@login02 condaV2]$ parsnp -r /share/home/wanglamei/Data/Bacillus/40species/GCA_000832605_ref.fasta -d /share/home/wanglamei/Data/Bacillus/40species/.fasta -o /share/home/wanglamei/ParSNP/40species/condaV2 -F ^C
(parsnp) [wanglamei@login02 condaV2]$ parsnp -r /share/home/wanglamei/Data/Bacillus/40species/GCA_031316625.1_ASM3131662v1_genomic.fasta -d /share/home/wanglamei/Data/Bacillus/40species/
.fasta -o output
16:51:40 - INFO - |--Parsnp 2.0.5--|

16:51:40 - INFO -


SETTINGS:
|-refgenome: /share/home/wanglamei/Data/Bacillus/40species/GCA_031316625.1_ASM3131662v1_genomic.fasta
|-genomes:
/share/home/wanglamei/Data/Bacillus/40species/GCA_000262045.1_KCTC_13613_01_genomic.fasta
/share/home/wanglamei/Data/Bacillus/40species/GCA_000712595.1_ASM71259v1_genomic.fasta
...36 more file(s)...
/share/home/wanglamei/Data/Bacillus/40species/GCA_037907705.1_ASM3790770v1_genomic.fasta
/share/home/wanglamei/Data/Bacillus/40species/GCA_900177005.1_Bcereus.16-00174_genomic.fasta
|-aligner: muscle
|-outdir: output
|-OS: Linux
|-threads: 1


16:51:40 - INFO - <>
16:51:40 - INFO - No genbank file provided for reference annotations, skipping..
16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_000262045.1_KCTC_13613_01_genomic.fasta is 1.63x shorter than reference genome! Skipping...
16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_000712595.1_ASM71259v1_genomic.fasta is 1.32x shorter than reference genome! Skipping...
16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_000769555.1_ASM76955v1_genomic.fasta is 1.53x shorter than reference genome! Skipping...
16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_001278705.1_ASM127870v1_genomic.fasta is 1.34x shorter than reference genome! Skipping...
16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_001307105.1_ASM130710v1_genomic.fasta is 1.69x shorter than reference genome! Skipping...
16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_001517105.1_ASM151710v1_genomic.fasta is 1.51x shorter than reference genome! Skipping...
16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_001584325.1_ASM158432v1_genomic.fasta is 1.67x shorter than reference genome! Skipping...
16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_001857925.1_ASM185792v1_genomic.fasta is 1.69x shorter than reference genome! Skipping...
16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_002250945.2_ASM225094v2_genomic.fasta is 1.48x shorter than reference genome! Skipping...
16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_002993925.1_ASM299392v1_genomic.fasta is 1.41x shorter than reference genome! Skipping...
16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_003148415.1_ASM314841v1_genomic.fasta is 1.42x shorter than reference genome! Skipping...
16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_004103615.1_ASM410361v1_genomic.fasta is 1.28x shorter than reference genome! Skipping...
16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_004124315.2_ASM412431v2_genomic.fasta is 1.49x shorter than reference genome! Skipping...
16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_006094475.1_ASM609447v1_genomic.fasta is 1.52x shorter than reference genome! Skipping...
16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_008244765.1_ASM824476v1_genomic.fasta is 1.63x shorter than reference genome! Skipping...
16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_014042035.1_ASM1404203v1_genomic.fasta is 1.45x shorter than reference genome! Skipping...
16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_017832095.1_ASM1783209v1_genomic.fasta is 1.35x shorter than reference genome! Skipping...
16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_029772985.1_ASM2977298v1_genomic.fasta is 1.44x shorter than reference genome! Skipping...
16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_030908325.1_ASM3090832v1_genomic.fasta is 1.36x shorter than reference genome! Skipping...
16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_031195075.1_ASM3119507v1_genomic.fasta is 1.68x shorter than reference genome! Skipping...
16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_031316515.1_ASM3131651v1_genomic.fasta is 1.66x shorter than reference genome! Skipping...
16:51:40 - ERROR - File /share/home/wanglamei/Data/Bacillus/40species/GCA_031317525.1_ASM3131752v1_genomic.fasta is 1.53x shorter than reference genome! Skipping...
16:51:40 - INFO - Recruiting genomes...
16:53:37 - INFO - Too few genomes to run partitions of size >50. Running all genomes at once.
16:53:37 - INFO - Running Parsnp multi-MUM search and libMUSCLE aligner...
16:55:27 - WARNING - Aligned regions cover less than 10% of reference genome!
Please verify recruited genomes are all strain of interest
16:55:30 - INFO - Reconstructing core genome phylogeny...
16:55:58 - INFO - Aligned 18 genomes in 4.30 minutes
16:55:58 - INFO - Parsnp finished! All output available in output`

Hi @Arthurdyu!

If you run parsnp with the --curated flag, it will include all of the input sequences in the alignment, regardless of length or sequence similarity.

I'll be sure to add another option in the next release that will allow you to skip the length filter but still do the similarity filter, but in the meantime the --curated should do the trick.