flass / tree2

Python modules for manipulating rooted phylogenetic trees

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

tree2

Manipulating phylogenetic trees: topologies, annotations, gene/species tree reconciliations

Description

tree2 is a source-package of object-oriented Python modules for dealing with phylogenetic trees. It contains several classes based on the basic class Node, that represents a node of a tree. Internal nodes contain their children nodes as attributes, so by reccursion the root node contains all its descendent and is equated to the complete tree. It follows that tree objects are intrisincally rooted, even though the input unrooted trees are supported and represented as multifurcated at the root. Other attributes are attached to each Node instance, like branch length, label and comment, so it is sufficient to represent everything that could be found in the Newick format plus bracketted comments like this:

((SpeciesA[Phenotype1]:0.45,SpeciesB[Phenotype1]:0.42)0.84:0.75,(SpeciesC[Phenotype1]:0.58,SpeciesD[Phenotype2]:0.85)0.95:0.115, SpeciesE[Phenotype3]:0.50);

It accomodates labelling of internal nodes (that can come handy in exploring species trees), with labels replacing support values:

((SpeciesA[Phenotype1]:0.45,SpeciesB[Phenotype1]:0.42)Clade1:0.75,(SpeciesC[Phenotype1]:0.58,SpeciesD[Phenotype2]:0.85)Clade2:0.115, SpeciesE[Phenotype3]:0.50);

Node class methods are inherited by descendant classes and include most of the methods for topology manipulation, like pruning, re-rooting, etc. It also provides a method to vizualize instantly (from within the Python interpretter) a tree object using SEAVIEW's basic graphic engine, an option that comes quite handy during code development to evaluate the properties of manipulated trees.

Further annotations and handling of more complicated tree formats (NEXUS and phyloXML) is dealt be the AnnotatedNode class. It notably includes extended attributes such as branch color, node ID and taxonomic ID. Export of an AnnoatedNode tree instance of information is supported without annotation loss by export in phyloXML format. This allows smarter graphic representation of a tree through an external call to Figtree or Archaeopteryx programs.

Based on AnnoatedNode are GeneTree and ReferenceTree classes, that are mostly enriched in methods for gene/species tree reconciliation procedures. These methods otfen require to involve both types of objects that represent the gene and species trees, respectively, and perform annotations on them given the output of the reconciliations, notably evolutionary events like gene duplication, horizontal gene transfer or gene loss.

Requirements

This package was developped under LINUX but as it is pure Python (except for certain shell calls to external) it should run OK on other operating systems. It runs on Python version 2.(>=5). When installing, do not forget to update yout $PYTHONPATH environment variable. Optional programs can be installed for use through the API proveded by tree2:

  • PhyML, phylogenetic reconstruction - used for re-estimating branch length and supports after topological manipulations (download here or available as a standard Debian package)
  • SeaView, graphic platform for sequence alignment, tree reconstruction and vizualisation - used here only fore tree vizualisation (download here or available as a standard Debian package)
  • Figtree, graphic representation of annotated phylogenetic trees, using NEXUS format (download here or available as a standard Debian package)
  • Archaeopteryx, graphic representation of richly annotated phylogenetic trees, using phyloXML format (download here)
  • Count, parsimonious and ML estimation of gene gain/loss scenrios along a species tree given a phylogenetic profile (download here)

Credit

tree2 package is derived from the tree module from alfacinha package by Leonor Palmeira and Laurent Guéguen (doc here: http://pbil.univ-lyon1.fr/software/alfacinha/)

Usage

To get started, you have to create a tree from a Newick string:

import tree2
t=tree2.Node(newick="(Bovine:0.69395,(Gibbon:0.36079,(Orang:0.33636,(Gorilla:0.17147,(Chimp:0.19268, Human:0.11927)0.89:0.08386)0.94:0.06124)0.94:0.15057)0.90:0.54939,Mouse:1.21460)0.86:0.10;") # typped in

or

import tree
t=tree2.Node(file="/path/to/tree.newick") # from a file

Then you can print out your tree to verify it:

print t			# standard Newick string representation
t.arborescence_ASCII()	# hierarchical arborescence representation in text mode
t.seaview()		# graphic representation

You can now access its several attributes, globally or for specific nodes

t.get_leaf_labels()	# list of leaf labels
h = t['Human']		# access a node through its indexed label (must be unique)
h.label()		# node label
h.lg()			# length of the branch leading to the node (above the node)
f = h.go_father()	# the node's parent (Node object)
f.bs()			# support of the parent branch
c = f.get_children()	# the parent's direct children (list of Node objects)
f.children_labels()	# their labels only
# there are derivated methods that allow you to navigate in the tree too
b = h.go_brother()	# brother node of a node ; only if the prent is bifurcated!
r = h.go_root()		# root node of the tree
r == t			# True

You may want to find the clade that include all great ape species (the node representing their last common acestor) and label it accordingly:

ga = t.map_to_node(['Orang', 'Gorilla', 'Chimp', 'Human'])
ga.edit_label('GreatApes')
print t.newick(ignoreBS=True)	# ignore branch supports to display internal node labels
t.seaview(ignoreBS=True)

and then you can access the node instance through its label:

print t['GreatApes'].newick(ignoreBS=True)

You can iterate on the nodes of the tree:

# using iterator
for n in t: print n.label()
# or building a list of nodes, using different traversal orders
ln0 = t.get_all_children() ; print [n.label() for n in ln0]		# pre-oder traversal (classic root-to-leaves exploration)
ln1 = t.get_sorted_children(order=1) ; print [n.label() for n in ln1]	# ordered by decreasing depth (i.e. increasing node distance from root)
ln2 = t.get_sorted_children(order=2) ; print [n.label() for n in ln2]		# post-oder traversal (exploration of each group of leaves, then the nodes above)

You may want to remove some species from your dataset while keeping the properties of the rest of the tree (notably consistent branch lengths and supports)

o = t.pop('Orang') # prune the Orang branch identified by its label
print o
print t
t.seaview(ignoreBS=True)
ho = t.map_to_node(['Gorilla', 'Chimp', 'Human']) # identify the node of the Homininae clade
hn = t.pop(ho) # prune the Homininae branch identified by its obect reference
print t
t.seaview(ignoreBS=True)
print hn

Many other features are accessible and editable, but for these please refer to the specific documentation of each function and class methods through help() function.

Need help?

If you have any problems or comments about tree2, please create an issue on this repository's GitHub page, or contact me at: mailto:florent.lassalle@sanger.ac.uk.

About

Python modules for manipulating rooted phylogenetic trees

License:GNU General Public License v3.0


Languages

Language:Python 100.0%