rsiani / OGFilter

A script that parses the OrthoFinder output and keeps only the orthogroups that meet specific criteria

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OGFilter

OGFilter is a script that parses the OrthoFinder output and returns only the orthogroups that meet specific criteria.
These criteria include:

  • Minimum number of species present in orthogroup
  • Maximum number of gene copies per species in orthogroup

This feature is particularly useful in phylogenomics analysis, where a researcher is interested in obtaining genes with high species representation but low multiple copy occurences.

Arguments

Argument Description
-g filename file w/ gene counts (from OrthoFinder output)
-s dirname directory that contains the orthogroups fasta files (from OrthoFinder output)
-o dirname directory to write the output orthogroups
-min_species float minimum proportion of the original species in the desired orthogroups
-max_copies int maximum number of gene copies per species in the desired orthogroups

Example usage

python OGFilter.py -g Orthogroups.GeneCount.tsv -s Orthogroup_Sequences -o output_dir -min_species 0.8 -max_copies 3 

This will create an output directory (-o output_dir) which will contain all orthogroups with at least 80% of the original species present (-min_species 0.8) and with maximum 3 copies per species (-max_copies 3).


======================================================================================


Who
Mattia Giacomelli (mattia.giacomelli@bristol.ac.uk);
Paschalis Natsidis (p.natsidis@ucl.ac.uk);

Where
Pisani Lab, Uni Bristol;
Telford Lab, UCL;
ITN IGNITE;

When
November 2019;

About

A script that parses the OrthoFinder output and keeps only the orthogroups that meet specific criteria


Languages

Language:Python 100.0%