adamewing/l1seq

L1-seq analysis tool

Script for analysing "L1-seq" data generated by targeted L1-specific sequencing (Ewing and Kazazian 2010, doi:10.1101/gr.106419.110)

Prerequisites

In order to run l1seq.py, a number of packages need to be present on your system.

pysam:

pip install pysam

numpy:

pip install numpy

align:

git clone https://github.com/adamewing/align
cd align
python setup.py build
python setup.py install

Instructions, assuming L1-seq results have been aligned to the human reference genome e.g. via bwa or bowtie2.

Optional: If multiple samples are to be analysed, merge them into a single BAM maintaining distingt read groups for each sample. This can be accomplished using samtools merge:

samtools merge -r merged_samples.bam sample1.bam sample2.bam sampleN.bam
samtools index merged_samples.bam

Build mappability tabix:
```
cd ref
./make_human_mappability.sh
```

Run l1seq.py:

./l1seq.py \
-b l1seq.alignment.bam \
-m ref/hsMap50bp.bed.gz \
--ref ref/hg19.primate.L1.bed.gz \
--nonref ref/hg19.nonref.L1.bed.gz \
> l1seq.results.tsv

Additional Information

The memory footprint may be quite large when run over a large BAM file (e.g. when many samples are merged). The -c/--chrom option may be used to limit the run to a single chromosome thus decreasing the memory requirement.
This tool is not intended for any other data type (e.g. capture sequencing, WGS). TEBreak [https://github.com/adamewing/tebreak] is one option for performing these analyses.

adamewing / l1seq

L1-seq analysis tool

Prerequisites

Additional Information

About

Languages