amino acid sequences

Question

amino acid sequences

lidpeck opened this issue 5 years ago · comments

Hello

I have successfully used superfocus on ORFs. However when I try to run it on amino acid sequences I get the below error - do you know why this is? I think the problem is the error message about an E (Error: Error reading input stream at line 2: Invalid character (E) in sequence) but not sure why it would give me this if I've told it to read amino acids (-p 1)?

Thanks in advance

superfocus -q Orthogroups -dir Orthgroups/output -o all_wilts -a diamond -db DB_100 -p 1
[2019-12-02 17:39:16,861 - INFO] SUPER-FOCUS: A tool for agile functional analysis of shotgun metagenomic data
[2019-12-02 17:39:16,865 - INFO] 1.1) Working on: all_wilts.fasta
[2019-12-02 17:39:16,865 - INFO] Aligning sequences in all_wilts.fasta to 100 using diamond
diamond v0.9.14.115 | by Benjamin Buchfink buchfink@gmail.com
Licensed under the GNU AGPL https://www.gnu.org/licenses/agpl.txt
Check http://github.com/bbuchfink/diamond for updates.

#CPU threads: 4
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
#Target sequences to report alignments for: 25
Temporary directory: /usr/local/anaconda3/lib/python3.7/site-packages/superfocus_app/db/tmp
Opening the database... [0.00012s]
Opening the input file... [0.000144s]
Opening the output file... [0.000175s]
Loading query sequences... [0.000113s]
Error: Error reading input stream at line 2: Invalid character (E) in sequence
diamond v0.9.14.115 | by Benjamin Buchfink buchfink@gmail.com
Licensed under the GNU AGPL https://www.gnu.org/licenses/agpl.txt
Check http://github.com/bbuchfink/diamond for updates.

#CPU threads: 4
Loading subject IDs... [0.000217s]
Error: Invalid DAA file. DIAMOND run has probably not completed successfully.
[2019-12-02 17:39:16,905 - INFO] Parsing Alignments
Traceback (most recent call last):
File "/usr/local/anaconda3/bin/superfocus", line 10, in
sys.exit(main())
File "/usr/local/anaconda3/lib/python3.7/site-packages/superfocus_app/superfocus.py", line 342, in main
del_alignments)
ValueError: not enough values to unpack (expected 2, got 0)

Geni Silva commented 5 years ago

Great.

Geni Silva · Answer 1 · Tue Dec 03 2019 02:08:06 GMT+0800 (China Standard Time)

hey @lidpeck, were you able to run it with nucleotides?

Geni Silva · Answer 2 · Tue Dec 03 2019 02:13:12 GMT+0800 (China Standard Time)

@lidpeck. I just checked the tool code and noticed that BLASTp for DIAMOND is not active. I will need to fix it and release a new version. Sorry.

I should have a fix in the next hours.

lidpeck · Answer 3 · Tue Dec 03 2019 02:37:38 GMT+0800 (China Standard Time)

@metageni amazing thanks. yes I ran it successfully with nucleotide sequences

Geni Silva · Answer 4 · Tue Dec 03 2019 02:51:15 GMT+0800 (China Standard Time)

@lidpeck I have pushed a new version into master (https://github.com/metageni/SUPER-FOCUS). I have not released it yet. Could you please give it a try?

thanks

lidpeck · Answer 5 · Tue Dec 03 2019 04:09:18 GMT+0800 (China Standard Time)

Great thanks - I copied over the new script from do_alignment.py and now it is running for my amino acid sequences with no errors. However the output .xls files are empty although the fasta_alignments.m8 file has genes and functions in it? I have also re-run superfocus_downloadDB to see if that sorted it (but with no luck).

Geni Silva · Answer 6 · Tue Dec 03 2019 09:27:03 GMT+0800 (China Standard Time)

Interesting. What is your OS? Could you please try rapsearch or blast with a sub-set of your input?

lidpeck · Answer 7 · Tue Dec 03 2019 19:07:02 GMT+0800 (China Standard Time)

I'm using catalina. I've just re-run with BLAST (db 98) and got the same result (fasta_alignments file with info in, all xls files empty)

Geni Silva · Answer 8 · Wed Dec 04 2019 04:28:48 GMT+0800 (China Standard Time)

@lidpeck very interesting. The only thing it tells me is that there was not a hit against the database.

lidpeck · Answer 9 · Wed Dec 04 2019 20:01:07 GMT+0800 (China Standard Time)

Hi Geni, you’re right! It’s working perfectly now – thanks a million From: Geni Silva <notifications@github.com> Reply to: metageni/SUPER-FOCUS <reply@reply.github.com> Date: Tuesday, 3 December 2019 at 20:28 To: metageni/SUPER-FOCUS <SUPER-FOCUS@noreply.github.com> Cc: "Peck, Lily" <l.peck18@imperial.ac.uk>, Mention <mention@noreply.github.com> Subject: Re: [metageni/SUPER-FOCUS] amino acid sequences (#54) Caution - This email from notifications@github.com originated outside Imperial @lidpeck<https://github.com/lidpeck> very interesting. The only thing it tells me is that there was not a hit against the database. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#54?email_source=notifications&email_token=AN53W7LRKCVOAU5DGIMMVTTQW26QDA5CNFSM4JTZTSKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF2WVHI#issuecomment-561343133>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AN53W7LUVARIUOOYM7QY2GTQW26QDANCNFSM4JTZTSKA>.