The sid
program can be used to detect single nucleotide polymorphisms in aligned sequencing reads.
Based on the pileup output of Samtools, for each genome site the most
likely genotype is computed.
To install sid, clone the repository and change into the source directory. To prepare the cloned
repository for compilation, cd
into the directory and run
./autogen.sh
./configure
The program is then compiled and (optionally) installed to /usr/local/bin
with
make
sudo make install
For preprocessing, a working installation of samtools
is needed. Given
the aligned sequencing data in SAM or BAM format, a pileup needs to be generated: samtools mpileup input.bam > pileup.dat
Useful parameters for the samtools mpileup
program are -C 50
,
which is recommended for BWA-aligned reads, and -q 1
which
discards reads with ambiguous mapping. The switches -q
and -Q
can be used to set a lower
threshold for mapping quality and base quality, respectively.
The pileup is then used as the sid
input:
sid -m local pileup.dat > output.csv
The -m
switch selects the SNP calling method, see sid -h
for all available ones.
The output is given as comma-separated values (CSV).
chrom,pos,label,gt,hom_conf,het_conf,conf_type
1,3003229,hom,GG,2.98235e-13,1,p_value
1,3003230,hom,GG,3.67401e-14,1,p_value
1,3003231,het,GA,1,8.01707e-17,p_value
1,3003232,hom,AA,4.36739e-12,1,p_value
Each line represents the genotype calling result for one genome site, defined by a chromosome
(chrom
) and a coordinate (pos
). The label
is either hom
or het
and indicates a homozygous
or heterozygous site, respecitvely, while the gt
column lists the called genotype. The next two
columns give the confidence in the call as a P-value.