markschl / seq_consensus

Alignment consensus in Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Consensus sequences from multiple alignments

Python package Documentation Status

seq_consensus is a simple Python 3 library focused on calculating consensus sequences. Ambiguous letters in the input are handled as well. Numpy is used under the hood. Currently, DNA/RNA sequences are supported.

The package additionally offers a small utility (cons_tool), which allows calculating consensus sequences on the commandline.

How is the consensus calculated?

The method is identical with the approach by Geneious and very similar to the function ConsensusSequence from the DECIPHER R package (options a little different). The API documentation contains some more description.

Documentation

The complete user guide is found here and the API is documented here. Below some small examples for demonstration:

Usage example

from seq_consensus import consensus

seqs = [
    'ATTGC',
    'AT-CC',
    'RT-C-'
]

consensus(seqs, threshold=0.6)

This returns:

'AT-CC'

Commandline tool examle

The script cons_tool allows using the same functionality from the commandline. An especially useful feature is the possibility to group sequences by arbitrary regular expression pattern matched in the sequence headers:

cons_tool -k 'p:\w+' input.fasta

Example output (given that taxonomic annotations are present in the headers):

>p:Evosea consensus (n=124)
TACKATTTA--RTATTGAC-?TWA?-GKTACTAAAGCATGGGKA-T?AAA?AGGATTAGAGACCCTYGTA
>p:Chordata consensus (n=7065)
TWAYTTTA?--WAW-YWAY-YTGAA-YCCACGAAAGCTAAGAMA-CAAACTGGGATTAGATACCCCACTA
>p:Mollusca consensus (n=843)
TWAWTWTAW--WAW?WWAY-TTGAA-KYYAYGAAAKCTWRGRWA-YAAACTAGGATTAGATACCCTAYTA
>p:Chordata consensus (n=8509)
TWAYTTTA?--WAW-YMAC-TTGAA-CCCACGAAAGCTARGAMA-CAAACTGGGATTAGATACCCCACTA
>p:Platyhelminthes_ consensus (n=130)
TWAWTWTAA--WDW?TKWY-YTGAA-KYYACGAAAGYTAKGWTA-YAAACTGGGATTAGATACCCCATTA
>p:Ascomycotaconsensus (n=280)
TTAWTWTAA--WAA?TDAC-TTGAR-K??ACGAAAGCTWRGRWA-CAAACTAGGATTAGATACCCYABTA
>p:Streptophyta consensus (n=269)
TWAWTWTAW--WAW?TRAY-TTGAR-KY?ACGAAAGCTTRGRKA-CAAACTAGGATTAGATACCCTAKTA
(...)

About

Alignment consensus in Python

License:MIT License


Languages

Language:Python 100.0%