Ulthran / ShotgunUnifrac

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ShotgunUnifrac

Tests Codacy Badge codecov Super-Linter Documentation Status

A dual use program for downloading and extracting genes from NCBI and for creating phylogenetic trees for many marker genes and merging the results into one

Install

git clone git@github.com:Ulthran/ShotgunUnifrac.git

To install the CorGE library for downloading, extracting, and merging genes,

cd ShotgunUnifrac/
pip install CorGE/

Prereqs

  1. Anaconda/miniconda (https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html#regular-installation)
  2. Snakemake (https://snakemake.readthedocs.io/en/stable/getting_started/installation.html)
  3. (For testing only) PyTest
  4. (For containerization only) Singularity

Running tests

For the CorGE package,

pytest CorGE/tests

For the tree building Snakemake workflow,

pytest .tests/

Running

To download and collect genomes for tree building,

CorGE collect_genomes --ncbi_species LIST_OF_TXIDS.txt --ncbi_accessions LIST_OF_ACCS.txt --local /path/to/local/db

And then to filter out genes of interest and curate everything for tree building,

CorGE extract_genes

The default --file_type behavior is 'prot' so that can be left off or switched to 'nucl' if you want to build trees based on nucleotide sequences. Finally to generate the tree, make sure you're in the directory with all the output from the previous step and run,

snakemake -c --use-conda --conda-prefix .snakemake/ --configfile /path/to/project/config.yml

This should output a file called RAxML_supermatrixRootedTree.final which contains the final tree

A worked example is given in the docs.

About


Languages

Language:Python 96.2%Language:Dockerfile 3.8%