maxganser / consistency-script

Evaluating the consistency of molecular diagnostic characters (signature characters) detected by DeSignate.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Consistency script

Tool description

In a first step, the tool utilizes DeSignate to detect signature characters for a selected query group in a reference alignment and alternative alignments comprising identical sequences. Secondly, consensus signature characters congruently detected in all alignments are identified.

For more details and an example application, please read our manuscript:

Ganser M.H., Santoferrara L.F., and Agatha S. (2022). Molecular signature characters complement taxonomic diagnoses: a bioinformatic approach exemplified by ciliated protists (Ciliophora, Oligotrichea) @ Molecular Phylogenetics and Evolution (in press)

Usage

Requirements

This script requires DeSignate (Hütter et al. 2020), a tool that detects molecular signature characters for taxon diagnoses. To use DeSignate, clone its repository to the root directory of this repository:

git clone https://github.com/DatabaseGroup/DeSignate

Input files

  1. Alignment files in fasta format
    • Example:
    >Sequence-1-label
    -TTGGCTGTCACAGTGTC-
    >Sequence-2-label
    --TGGTACTGACAGTGT--
    ...
    
  2. Two separate files with comma separated sequence labels comprising the query and reference group (e.g., in txt or csv format)
    • Example:
    Sequence-1-label, Sequence-2-label, ...
    
    PLEASE NOTE: Sequence labels must be identical in the alignments and also exactly match those in the query and reference group files. Otherwhise, the program terminates with an error message stating the missing/wrong sequence labels.

Output files

  • consensus-sigchars.csv : Alignment positions of consensus signature characters + DeSignate results (character states, signature type, entropy values)
  • non-consensus-positions.csv : Reference alignment positions of non-consensus signature characters
  • designate-results.csv : Complete DeSignate results of reference alignment for the selected query and reference groups

Commands

To execute the script use the following command:

python consistency.py --alignments path/alignment_01.fasta path/alignment_02.fasta path/alignment_03.fasta --query_group path/query_group.txt --reference_group path/reference_group.txt
List of commands:
--alignments : Paths to alignment files. The first file represents the reference alignment, subsequent files represent alternative alignments.
--query_group : Path to query group file.
--reference_group: Path to reference group file.
--k_window : Two position analysis, default = 1 for one position analysis.
--consider_gaps : Include gaps as a character state, default = True.

About

Evaluating the consistency of molecular diagnostic characters (signature characters) detected by DeSignate.

License:MIT License


Languages

Language:Python 100.0%