zonglab / snapTotal-seq

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

snapTotal-seq

This repository provides the data analysis pipeline of single cell RNA-seq data generated by snapTotal-seq.

Instructions:

The libraries prepared by using snapTotal-seq can be sequenced using either customized primers or Illumina standard sequencing primers. If the library is sequenced using customized primers, the fastq files of each single cell can be demultiplexed by using Illumina standard demultiplex pipeline. The users can directly proceed with mapping (01_snapTotal_mapping.sh) and UMI counting (02_split_exon_intron_reads_customized_run.sh). If the library is sequenced using Illumina standard sequencing primer, the library will be demultiplexed into several sub-libraries based on i5 index using Illumina standard demultiplex pipeline. For each sub-library, the users can then perform the second demultiplexing based on the cell barcode sequenced in read 2 using 00_demultiplex_for_standard_run.sh script, which will generate the fastq files of each single cell. Mapping can then be performed using 01_snapTotal_mapping.sh, which was followed by UMI counting with 02_split_exon_intron_reads_standard_run.sh. The python scripts in step 3 and 4 can be used to generate the UMI count matrix for exon and intron reads respectively. The output UMI matrices can be used for downstream analyses.

Requirements:

  1. STAR (v2.5.3a)
  2. Cutadapt (v3.4)
  3. Python (v3.8.8)
  4. Pysam (v0.16.0.1)
  5. Htseq-count (v0.13.5)
  6. Samtools (v1.12)
  7. seqtk (v1.3)
  8. Pandas (v1.2.4) The versions we used are listed in parentheses.

Installation:

  1. STAR: https://github.com/alexdobin/STAR
  2. Cutadapt: https://cutadapt.readthedocs.io/en/stable/installation.html
  3. Samtools: http://www.htslib.org/download/
  4. seqtk: https://github.com/lh3/seqtk
  5. Python and related packages (i.e., Htseq-count, Pysam, Pandas) can be installed through Conda. Note: please change the directories in the scripts before use. The python scripts and files that are used in this pipeline are included in 'scripts' folder.

Demo:

We provide an example data which is sequenced on NovaSeq platform (2x150bp). Since it has been demultiplexed, users can directly proceed with mapping and UMI counting using 01_snapTotal_mapping.sh and 02_split_exon_intron_reads_standard_run.sh respectively. The expected output results are also provided in 'Example' folder.

About


Languages

Language:Python 44.5%Language:R 31.2%Language:Shell 24.3%