veg / hyphy

HyPhy: Hypothesis testing using Phylogenies

Home Page:http://www.hyphy.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Labelling branches on thousands of phylogenies

mbarkdull opened this issue · comments

Hello,

I am planning to run aBSREL and RELAX on a set of about 14,000 orthogroups and corresponding gene trees (identified using OrthoFinder). My biological question is essentially “which genes are under positive/relaxed selection in all of the species that share X trait compared to those lacking the trait?”.

Of course, using phylotree to manually label 14,000 gene trees isn't going to be feasible. I've written an R script (https://github.com/mbarkdull/FormicidaeMolecularEvolution/blob/main/scripts/LabellingPhylogeniesHYPHY.R) that can partially accomplish this task, by checking whether the species abbreviation in each tip label is present in a vector of species of interest and, if it is, adding {Foreground} to the end of the tip label.

However, I realized that this method will label only the terminal branches in my trees and not internal nodes that might be of interest. In other words, in the tree below, if I want species with worker reproduction when the queen is lost to be foreground species, my script will label Trachymyrmex cornetzi, Acromyrmex echinatior, and Atta cephalotes individually, but not the internal nodes uniting the three.

toyTree

I have two questions:

  1. Would it be reasonable to only label terminal branches, given the question I'm attempting to address?
  2. If not, do you have a suggestion for how I might automate the labelling process? I'm not sure how to proceed, especially because the gene trees can contain only a subset of all the species of interest, and might also contain multiple genes per species (for example: OG0001224_tree.txt ).

Thank you so much for your assistance!
Megan Barkdull

Dear @mbarkdull,

Please take a look at https://github.com/veg/hyphy-analyses/tree/master/LabelTrees
This is a HyPhy script which can take a tree and annotate it based on

  1. A regular expression to pick out leaves of interest
  2. A line list of said species

There are multiple strategies for labeling internal nodes which are described in the link I provide.

For you application I would suggest either --internal-nodes "Parsimony" or --internal-nodes "All descendants" (which is also the default).

Best,
Sergei

Wonderful, thank you- I'm glad I checked before reinventing the wheel!

Hi @mbarkdull

I am also trying to identify genes under selection. I read some articles online and the authors used orthofinder to get the alignment file. I am new to this kind of analysis and would be grateful if you can describe how I can go ahead with this analysis.
Thanks.