MPBindv3

This is a modified version of MPBindv2.1, a program intended for statistical analysis of SELEX data in an attempt to predict strong-binding aptamers. It essentially relies on two types of statistical tests: Fisher's exact test and Spearman's rank correlation.

In the original program, k-mer counting is done using a basic hash implementation. This has been replaced with an external k-mer counting program for speed efficiency. In addition, the original program performs the two statistical tests on count-by-frequency and count-by-read data. This modified version performs them on only the count-by-frequency data.

Although there are various differences between MPBindv2.1 and MPBindv3, the modifications have been incorporated such that program usage is still the same. Therefore the following will be a copy of the usage instructions from the original README file.

Prerequisites

Python (version >=2.4.3), Java (version >= 1.7) and R (version >=2.13.0) are required to be installed.

Installation

• Download MPBind (Linux or MacOS) • tar –xzf • Add MPBind directory to the $PATH environment variable (Optional) or you need to type the absolute path of MPBind directory before you run this program.

Usage Step 1: Preprocess SELEX-Seq reads (Optional):

MPBind requires input files should be in plain text format with each row only contains sense aptamer sequences. To this end, MPBind provides MPBind_ Preprocess.py script to transform raw sequencing reads formats (FASTQ or FASTA) to plain text format. It will also automatically transform antisense reads to sense reads based on matching primer sequences.

Command: python MPBind_ Preprocess.py < Parameters>

Required Parameters: -Infile: Input file name -t: input file format (FASTA or FASTQ) -Forward_primer: Forward primer sequence -Reverse_primer: Reverse primer sequence -primer_max_mismatch: The maximal mismatches allowed to match primers -Outfile: Output file name

Command Example python MPBind_ Preprocess.py -Infile Test.fastq –t FASTQ -Forward_primer AGCAGCACAGAGGTCAGATG -Reverse_primer TTCACGGTAGCACGCATAGG -primer_max_mismatch 1 –Outfile Test_sequence.txt

Step 2: MPBind (training) MPBind requires the input sequences should be in plain text format

Input file Example (plain text): CTTTGCCACCGGGTTGTAGTTACGGCTGA CTTTGCCACCGGGTTGTAGTTACGGCTGA TTATGTTTTTTTTTTTTTTTAATGCCCTG GTTTTCAAAGAGGCTCGACCTGACTTCTA GGTTTGCTGAGGTGGGCTCTGTTTAACCT GCAGGTGTGGTTTGCTGAGGTGGGCCCTG TTCCCCAATAACATCGTATACCCGCGCCC

Command: Python MPBind_Train.py

Required Parameters: -R0: Initial library file [plain text format] -RS: SELEX round files (e.g., R1, R2, R3, …) [Plain text format] -RC: Control Seq round (No target and just control PCR amplification) <Optional, default=NULL> -mer: Motif length (e.g., 5,6,7) <default=6> -U: <1: Unique reads only; 2: Redundant reads only; 3 Both> (default=1)

1: Unique reads only: merged duplicates to one read

# 2: Redundant reads only: Using all reads
# 3: Both: MPBind will generate two sub-folders for ‘Unique reads only’ and  ‘Redundant reads only’, respectively.

-Out: Output file folder <Optional, default=MPBind_Out>

Command Example Python MPBind_Train.py -R0 R0.txt -RS R1.txt,R2.txt,R3.txt,R4.txt,R5.txt,R6.txt,R7.txt -RC Control.txt -nmer 6 -U 3 -Out MPBind_Out_R01234567_Unique_and_Redundant

Output files: It will generate *.train.nmer (e.g., Test.train.6mer) files under the output file folder.

Step 3: MPBind (Prediction) Command: python MPBind_Predict.py

Required Parameters: -Train: *.train.nmer (e.g., Test.train.6mer) files generated by MPBind_Train.py -Aptamer: Aptamer sequences to be predicted [Plain text format] -Sort: Sort Aptamer sequences based on combined meta-Z-score < default=FALSE> -Out: Output file

Command Example python MPBind_Predict.py -Train Test.train.6mer -Aptamer To_be_predicted.txt -Sort TRUE -Out Predicted_Aptamers.txt

Output file (columns):

Aptamer.Seq: Aptamer sequences (e.g., TTTTGTTTTTTGTTTTCTTTTCCCCCCTC)
Z1.Scan: Z1-Scores for each scanned position using n-mer window
Z1.MetaScore: Combined Z-Score using Z1 only
Z2.Scan: Z2-Score for each scanned position using n-mer window
Z2.MetaScore: Combined Z-Score using Z2 only
Z3.Scan: Z3-Score for each scanned position using n-mer window
Z3.MetaScore: Combined Z-Score using Z3 only
Z4.Scan: Z4-Score for each scanned position using n-mer window
Z4.MetaScore: Combined Z-Score using Z4 only
Z_Combined.Scan: Combined Z-Score for each scanned position using n-mer window (e.g., 8.0,7.4,3.4 …)
Z_Combined.MetaScore: Meta-Combined Z-Score

New Features (v3): (1) This version allows faster k-mer counting using external Kanalyze program (2) This version allows faster Fisher's test using internal (python) function, instead of external R (3) This version computes only Z1 and Z3 scores (does not compute Z2 or Z4)

Citation: Jiang P., Meyer S., Hou Z., Nicholas E. Propson, Soh H.T., Thomson J.A., Stewart R., MPBind: A Meta-Motif Based Statistical Framework and Pipeline to Predict Binding Potential of SELEX-derived Aptamers. (2014), Bioinformatics 30 (18): 2665-2667.

Contact Peng Jiang Computational Biologist Morgridge Institute for Research, Madison, WI 53707, USA Email: PJiang@morgridge.org Tel: 1-608-316-4479

AlaaALatif / MPBindv3

MPBindv3

1: Unique reads only: merged duplicates to one read

About

Languages