gillichu/sepp

Summary

This repository is a fork of SEPP and includes code for SEPP, TIPP, UPP, HIPPI. The three methods use ensembles of Hidden Markov Models (HMMs) in different ways, each focusing on a different problem.

Each of these related tools has its own README file.

README.SEPP.md

SEPP stands for "SATe-enabled Phylogenetic Placement", and addresses the problem of phylogenetic placement of short reads into reference alignments and trees.

README.UPP.md

UPP stands for "Ultra-large alignments using Phylogeny-aware Profiles", and addresses the problem of alignment of very large datasets, potentially containing fragmentary data. UPP can align datasets with up to 1,000,000 sequences.

README.UPP2.md

UPP2 is an improvement on UPP and introduces proper bitscore weighting for the ensemble of HMMs and a fast Hierarchical and EarlyStop modes for the HMM search

README.HIPPI.md

HIPPI stands for "Highly Accurate Protein Family Classification with Ensembles of HMMs", and addresses the problem of classifying query sequences to protein families.

README.TIPP.md

TIPP stands for "Taxonomic Identification and Phylogenetic Profiling", and addresses the problem of taxonomic identification and abundance profiling of metagenomic data. We have moved TIPP as a separate package from SEPP. TIPP package can be accessed here.

Bugs and Errors

UPP2 is under active research development at UIUC by the Warnow Lab. Please report any errors on UPP2 to Minhyuk Park (minhyuk2@illinois.edu) or Gillian Chu (gchu4@illinois.edu).

About

Ensemble of HMM methods (SEPP, TIPP, UPP)

GNU General Public License v3.0

Languages

Language:Python 60.1%Language:Java 32.7%Language:TeX 6.2%Language:Shell 0.7%Language:JavaScript 0.2%