metageni / SUPER-FOCUS

A tool for agile functional analysis of shotgun metagenomic data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

amino acid sequences

lidpeck opened this issue · comments

Hello

I have successfully used superfocus on ORFs. However when I try to run it on amino acid sequences I get the below error - do you know why this is? I think the problem is the error message about an E (Error: Error reading input stream at line 2: Invalid character (E) in sequence) but not sure why it would give me this if I've told it to read amino acids (-p 1)?

Thanks in advance

superfocus -q Orthogroups -dir Orthgroups/output -o all_wilts -a diamond -db DB_100 -p 1
[2019-12-02 17:39:16,861 - INFO] SUPER-FOCUS: A tool for agile functional analysis of shotgun metagenomic data
[2019-12-02 17:39:16,865 - INFO] 1.1) Working on: all_wilts.fasta
[2019-12-02 17:39:16,865 - INFO] Aligning sequences in all_wilts.fasta to 100 using diamond
diamond v0.9.14.115 | by Benjamin Buchfink buchfink@gmail.com
Licensed under the GNU AGPL https://www.gnu.org/licenses/agpl.txt
Check http://github.com/bbuchfink/diamond for updates.

#CPU threads: 4
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
#Target sequences to report alignments for: 25
Temporary directory: /usr/local/anaconda3/lib/python3.7/site-packages/superfocus_app/db/tmp
Opening the database... [0.00012s]
Opening the input file... [0.000144s]
Opening the output file... [0.000175s]
Loading query sequences... [0.000113s]
Error: Error reading input stream at line 2: Invalid character (E) in sequence
diamond v0.9.14.115 | by Benjamin Buchfink buchfink@gmail.com
Licensed under the GNU AGPL https://www.gnu.org/licenses/agpl.txt
Check http://github.com/bbuchfink/diamond for updates.

#CPU threads: 4
Loading subject IDs... [0.000217s]
Error: Invalid DAA file. DIAMOND run has probably not completed successfully.
[2019-12-02 17:39:16,905 - INFO] Parsing Alignments
Traceback (most recent call last):
File "/usr/local/anaconda3/bin/superfocus", line 10, in
sys.exit(main())
File "/usr/local/anaconda3/lib/python3.7/site-packages/superfocus_app/superfocus.py", line 342, in main
del_alignments)
ValueError: not enough values to unpack (expected 2, got 0)

hey @lidpeck, were you able to run it with nucleotides?

@lidpeck. I just checked the tool code and noticed that BLASTp for DIAMOND is not active. I will need to fix it and release a new version. Sorry.

I should have a fix in the next hours.

@metageni amazing thanks. yes I ran it successfully with nucleotide sequences

@lidpeck I have pushed a new version into master (https://github.com/metageni/SUPER-FOCUS). I have not released it yet. Could you please give it a try?

thanks

Great thanks - I copied over the new script from do_alignment.py and now it is running for my amino acid sequences with no errors. However the output .xls files are empty although the fasta_alignments.m8 file has genes and functions in it? I have also re-run superfocus_downloadDB to see if that sorted it (but with no luck).

Screenshot 2019-12-02 at 20 07 23

Screenshot 2019-12-02 at 20 08 05

Interesting. What is your OS? Could you please try rapsearch or blast with a sub-set of your input?

I'm using catalina. I've just re-run with BLAST (db 98) and got the same result (fasta_alignments file with info in, all xls files empty)

@lidpeck very interesting. The only thing it tells me is that there was not a hit against the database.

Great.