albertwcheng / KmerFinder

Find enriched kmer in SELEX library (assuming there exist only one motif per sequence)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Find enriched kmer in SELEX library (assuming there exist only one motif per sequence)

=== COMPILATION ===

bash make.sh

=== RUN ===

Step One: find (enriched) k-mers

Usage:findKmers <fgfilename> <howmanyfgseqtoread> <bgfilename> <howmanybgseqtoread> <k> <howmanyToFind>
Description: find <howmanyToFind> top enriched <k>-mers from HT-SELEX experiments using <fgfilename> and <bgfilename> fastq files assuming there's only one motif per sequence
Specify <howmanyfgseqtoread> or <howmanybgseqtoread>=0 to read all
Specify <howmanyToFind>=0 to print all kmers

e.g., 

./findKmers fg.fastq 1000000 bg.fastq 1000000 6 0 > 1000000_6_0.result.txt 2> 1000000_6_0.stderr.txt


Step Two: construct PWM (row matrix)

Usage: ./constructSimplePWM.py kmerFile colKmer colScore seed

e.g., 

./constructSimplePWM.py 1000000_6_0.result.txt 1 3 ATACAG > 1000000_6_0.result.ATACAG.pwm.rowmat

Step Three: convert row matrix format to format recognized by tinyray weblogo

Usage: ./ToTinyRayPWMFormat.sh RowMatrixFile outTinyRayPWMFile
Description: Convert the row matrix file from constructSimplePWM.py to the PWM format used by http://demo.tinyray.com/weblogo

./ToTinyRayPWMFormat.sh 1000000_6_0.result.ATACAG.pwm.rowmat 1000000_6_0.result.ATACAG.pwm.tinyray

cat 1000000_6_0.result.ATACAG.pwm.tinyray

now paste the content of the cat output to the weblogo interface at http://demo.tinyray.com/weblogo

About

Find enriched kmer in SELEX library (assuming there exist only one motif per sequence)


Languages

Language:C++ 69.1%Language:Python 18.1%Language:Shell 10.6%Language:Perl 1.8%Language:C 0.4%