mortonjt / SCOPE

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SCOPE++ is a C++-based program for accurately identifying homopolymer in cDNA sequences using Hidden Markov Models. This can be extended to trimming poly(A)/poly(T) tails, or identifying A,C,G,T,or N homopolymer sequences.

Installation

First, make sure that autotools is installed.

Then run ./configure; make; sudo make install

If you don't have permission, then create a directory and run the following command

./configure --prefix=<your directory>; make; sudo make install

Getting Started

To make sure that the tool is working, run the following command below

./scope -i test.fq -o test_out.fa

You should end up with output something like this

Input file: ./example/test.fastq
Output file: test.out
File type: illumina
Input File Format: fasta
Output File Format: fasta
polyType: A
Filter Width: 12
Edge MinLength: 4
Boundary States: 2
Mininum Length: 10
Maxinum Training Set: 1000
Laplacian Smoothing Parameter: 1
Details: 0
Zero Based: 0
Print Everything: 0
Print Best Alignment: 0
Building model
model finalized
Number of sequences with homopolymers 54
Number of sequences without homopolymers 39
Number of trashed sequences 0

Parameters

    Input:
       -i [input file] (required) 
          the fastq input file or the fasta input file
       --input_format [input file format] 
             (default = fasta)
             fasta or fastq
    Output:
       -o [output file](required) 
             A fasta file containing masked homopolymers tails
       --print_all [output options]
             Prints all sequences to the file.
             Otherwise will print only sequences with detected
             polyA tails
       --out_format [output file format] 
             (default = fasta)
             fasta or fastq
       --details [output details]
             outputs more information including alignment scores,
             homopolymer length, and percent identity
       -z [zero index] 
              Output format is printed in zero based indexing, half open intervals 
              By default it is printed in one based indexing, closed intervals
    Search Type:
       --homopolymer_type homopolymer type [N|A|G|C|TCG]
             e.g. option A is a polyA tail
             (default = A)
       --poly searches for poly(A) or poly(T)
       --trim
             trims poly(A)/poly(T) tails
    Tool parameters:
       --filter_width filter width
             Size of the sliding window
             (default = 8 base pairs) 
       --minLength Mininum homopolymer Length 
             (default = 10 base pairs) 
       --minIdentity = 70  The minimum identity a homopolymer can have
       --edge_minLength Edge boundary MinLength
          (default=6)
       --edge_states Number of states at boundaries
          (default = 1)
       --sampling_frequency  determines how often sequences should be sampled for training
          (default = 1)
       --numTrain Number of training sequences
             (default = 1000) 
       --left_gap Distance minLength of beginning of the poly(A)/poly(T) to read end
       --right_gap  Distance minLength of end of the poly(A)/poly(T) to read end
       --no_retrain Disables Baum Welch training
       --numThreads
    Help:
       --help help
       --version version information

More thorough descriptions of the parameters can be shown in the other README

About

License:Other


Languages

Language:Python 55.9%Language:C++ 21.8%Language:Shell 16.6%Language:R 5.0%Language:C 0.7%