zhoudreames / HAWK

Hitting associations with k-mers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This code is associated with the paper from Rahman et al., "Association mapping from sequencing reads using k-mers". eLife, 2018. http://dx.doi.org/10.7554/eLife.32920

HAWK

Hitting associations with k-mers

Installation

To install HAWK run (X.Y.Z is the version)

tar xf hawk-X.Y.Z-beta.tar
cd hawk-X.Y.Z-beta
make

Prerequisites

JELLYFISH (modified version available in supplements)

EIGENSTRAT (modified version available in supplements)

R (with foreach and doParallel packages)

ABYSS

Counting k-mers

The first step in the pipeline is to count k-mers in each sample, find total number of k-mers per sample, discard k-mers that appear once in samples and sort the k-mers. The k-mer file contains one line per k-mer present and each line contains an integer representing the k-mer and its count separated by a space. The integer representation is given by using 0 for 'A', 1 for 'C', 2 for 'G' and 3 for 'T'.

k-mer counting can be done using a modified version of the tool JELLYFISH provided in the 'supplements' folder with HAWK. All of the steps mentioned above can be performed by installing this version of JELLYFISH and then running the script 'countKmers' in supplements with necessary modifications. This will write the names of sorted k-mer count files in 'sorted_files.txt' and total k-mer count in samples in 'total_kmer_counts.txt'.

Running HAWK

Copy 'sorted_files.txt' and 'total_kmer_counts.txt' corresponding to the samples into a folder as well as a file named 'gwas_info.txt' containing three columns separated by tabs giving a sample ID, male/female/unknown denoted by M/F/U and Case/Control status of the sample for each sample. For example

SRR3050845	U	Control
SRR3050846	U	Case
SRR3050847	U	Control

Copy the scripts 'runHawk' and 'runAbyss' into the folder and run

./runHawk

The k-mers with significant association to case and controls will be in 'case_kmers.fasta' and 'control_kmers.fasta' which can then be assembled by running

./runAbyss

The assembled sequences will be in 'case_abyss.25_49.fasta' and 'control_abyss.25_49.fasta' respectively.

About

Hitting associations with k-mers

License:GNU General Public License v3.0


Languages

Language:C++ 88.2%Language:HTML 11.8%Language:R 0.0%Language:Makefile 0.0%Language:Awk 0.0%Language:C 0.0%