Final output file size 0
taylorreiter opened this issue · comments
Expected Behavior
I expect plass to output a fasta with amino acid sequences
Current Behavior
plass runs, but outputs a file with no amino acid sequences
Steps to Reproduce (for bugs)
Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.
wget -O SRS476121_69.fna.cdbg_ids.reads.fa.gz https://osf.io/p7fqc/download
plass assemble SRS476121_69.fna.cdbg_ids.reads.fa.gz SRS476121_69.cdbg_ids.reads.plass.faa tmp
Plass Output (for bugs)
Log file: 11388349399477705273_log.txt
File sizes in tmp
for plass run:
11388349399477705273_file_sizes.txt
Context
I am assembling reads that I think are derived from a single organism from a metagenome (e.g. reads from a spacegraphcats query). The reads are 101 bases long. The read file is 2.2GB, and I am treating it as single end.
Your Environment
I ran plass using conda, with the following environment:
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- plass=3.764a3
- cd-hit=4.8.1
- paladin=1.4.6
- samtools=1.10
- salmon=0.15.0
I am on a linux computer, and used plass with 128 gb of ram and 8 CPU (Ubuntu 18.04.4 LTS (GNU/Linux 4.15.0-70-generic x86_64))
Could you try to decrease the minimum translated ORF length with the --min-length
parameter?
Something between 25 to 30 should work fine. The default translated fragment length of 45 is too long to fit into the 101 bp long reads.
Update: I tried it out locally, the aa_6f_long
database has a reasonable size (instead of 0) if I pass a shorter min-length.
We should handle this case somehow better :/
By the way, if you want a set of stickers (see https://twitter.com/thesteinegger/status/1201076220957315074), send me your address to milot at mirdita de.
thank you so much for the quick response! I'll give this a try and report back.
Just saw your update -- thank you for testing this out!
@luizirber received two sticker sets and gave one to me since he knows I'm a plass enthusiast. Thank you!