Xinglab / intron-retention-paper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

intron retention project

The scripts used in the manuscript entitled 'Tracking pre-mRNA maturation across subcellular compartments identifies developmental gene regulation through intron retention and nuclear anchoring' are included in the present folders for the purpose of reproducibility and transparency of the results from the manuscript.

The scripts are broken down by the order of analysis.

Please download the datasets from https://doi.org/10.5281/zenodo.4540589 and run the analysis.

The code is provided without a warranty. Codes from parts of the analysis are rewritten to make them easy to understand thus may cause errors when running them. If you encounter any problems or questions, please report them to the author (zcpan1016@gmail.com).

download data

  • Download from Zenodo: https://doi.org/10.5281/zenodo.4540589
    • cd data
    • curl -L -o Intron.feature.annotation https://zenodo.org/record/4540589/files/Intron.feature.annotation?download=1
    • curl -L -o intron_FI_combined_results.txt https://zenodo.org/record/4540589/files/intron_FI_combined_results.txt?download=1
    • curl -L -o Intron_transcript.txt https://zenodo.org/record/4540589/files/Intron_transcript.txt?download=1
    • curl -L -o mm10.fa.gz https://zenodo.org/record/4540589/files/mm10.fa.gz?download=1
    • curl -L -o Mus_musculus.GRCm38.91.chr.gtf.gz https://zenodo.org/record/4540589/files/Mus_musculus.GRCm38.91.chr.gtf.gz?download=1
  • Download other data:
    • curl -L -O ftp://ftp.ensembl.org/pub/release-91/fasta/mus_musculus/cdna/Mus_musculus.GRCm38.cdna.all.fa.gz
  • Unzip
    • gunzip -c ./mm10.fa.gz > mm10.fa
    • gunzip -c ./Mus_musculus.GRCm38.91.chr.gtf.gz > ./Mus_musculus.GRCm38.91.chr.gtf
    • gunzip -c ./Mus_musculus.GRCm38.cdna.all.fa.gz > ./Mus_musculus.GRCm38.cdna.all.fa
    • cd ..

conda environment

  • conda create --prefix ./conda_env_py2
  • conda activate ./conda_env_py2
  • conda install -c conda-forge -c bioconda python=2 r-base=4 sra-tools star kallisto samtools=1.11 pysam=0.16.0 numpy=1.15.4 tqdm keras scikit-learn h5py r-rtsne r-ggplot2 r-ggthemes r-scales r-ggpubr
  • pip install Pillow
  • pip install pyclustering
  • pip install -U matplotlib
  • R
  • > repos <- "http://cran.us.r-project.org"
  • > install.packages("circlize", repos=repos)
  • > quit()
  • conda deactivate

quantification_intronRetention_1

Output: One example of output should look like
./results/mESC/Chr/Aligned.sortedByCoord.out.bam (RNA-seq alignments)
./results/mESC/Chr/kallisto/abundance.tsv (Gene expression)
./results/mESC/Chr/siri_out (Quantification of the levels of intron retentions)

Download fastq files

  • conda activate ./conda_env_py2
  • cd quantification_intronRetention_1
  • prefetch SRR12883492
  • fastq-dump --split-files ./SRR12883492/SRR12883492.sra

Create indices

  • STAR --runMode genomeGenerate --runThreadN 4 --genomeDir ./star_index --genomeFastaFiles ../data/mm10.fa --sjdbGTFfile ../data/Mus_musculus.GRCm38.91.chr.gtf
  • kallisto index ../data/Mus_musculus.GRCm38.cdna.all.fa --i Mus_musculus.GRCm38.cdna.all.fa.idx

Setup for run

  • mkdir -p results/mESC/Chr
  • mkdir -p data/mESC/fastq
  • mv ./SRR12883492_1.fastq data/mESC/fastq/Chr_1.fastq
  • mv ./SRR12883492_2.fastq data/mESC/fastq/Chr_2.fastq
  • edit PARENT_DIR variable at top of RNAseq_Analysis_Pipeline.py
  • chmod +x SIRI/bin/SIRI

Run

  • 8 threads 64 GB: python RNAseq_Analysis_Pipeline.py
  • conda deactivate
  • cd ..

xmeans_cluster_U_intron_2

Output: cluster.png should look like what is shown in the .ipynb

  • conda activate ./conda_env_py2
  • cd xmeans_cluster_U_intron_2
  • python run_scripts.py
  • conda deactivate
  • cd ..

ptc_analysis_3

Output: PTC_figure.png should look like what is shown in the .ipynb

  • conda activate ./conda_env_py2
  • cd ptc_analysis_3
  • python run_scripts.py
  • conda deactivate
  • cd ..

deep_learning_4

Output: ./results/performance.png should look like what is shown in the .ipynb

  • conda activate ./conda_env_py2
  • cd deep_learning_4
  • 8 threads 32 GB: python run_scripts.py
  • conda deactivate
  • cd ..

tsne_analysis_5

Output: tsne_plot.pdf should look like the one uploaded to the repo

  • conda activate ./conda_env_py2
  • cd tsne_analysis_5
  • Rscript tsne_plot.R
  • conda deactivate
  • cd ..

circos_plot_6

Output: circos.pdf should look like the one uploaded to the repo

  • conda activate ./conda_env_py2
  • cd circos_plot_6
  • Rscript circos_plot.R
  • conda deactivate
  • cd ..

About

License:MIT License


Languages

Language:Jupyter Notebook 81.0%Language:Python 16.2%Language:R 2.8%