adamewing / l1seq

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

L1-seq analysis tool

Script for analysing "L1-seq" data generated by targeted L1-specific sequencing (Ewing and Kazazian 2010, doi:10.1101/gr.106419.110)

Prerequisites

In order to run l1seq.py, a number of packages need to be present on your system.

pysam:

pip install pysam

numpy:

pip install numpy

align:

git clone https://github.com/adamewing/align
cd align
python setup.py build
python setup.py install

Instructions, assuming L1-seq results have been aligned to the human reference genome e.g. via bwa or bowtie2.

Optional: If multiple samples are to be analysed, merge them into a single BAM maintaining distingt read groups for each sample. This can be accomplished using samtools merge:

samtools merge -r merged_samples.bam sample1.bam sample2.bam sampleN.bam
samtools index merged_samples.bam
  1. Build mappability tabix:

    cd ref
    ./make_human_mappability.sh
    
  2. Run l1seq.py:

    ./l1seq.py \
    -b l1seq.alignment.bam \
    -m ref/hsMap50bp.bed.gz \
    --ref ref/hg19.primate.L1.bed.gz \
    --nonref ref/hg19.nonref.L1.bed.gz \
    > l1seq.results.tsv
    

Additional Information

  • The memory footprint may be quite large when run over a large BAM file (e.g. when many samples are merged). The -c/--chrom option may be used to limit the run to a single chromosome thus decreasing the memory requirement.
  • This tool is not intended for any other data type (e.g. capture sequencing, WGS). TEBreak [https://github.com/adamewing/tebreak] is one option for performing these analyses.

About


Languages

Language:Python 96.0%Language:Shell 4.0%