lhartmanis / compound-screen

Code and analysis scripts for analyzing newly transcribed RNA in large-scale compound screen experiments

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

compound-screen

This repository contains code and analysis scripts for inferring fraction of new RNA from NASC-seq2 styled bulk experiments and analyzing gene expression patterns in newly transcribed RNA. All expression data and raw sequencing files related to this project can be found at ArrayExpress with accession code E-MTAB-13091.

Data processing for a typical experiment follows the steps below. Further information about how to utilize the tools can be found in the respective subfolders.

1) Infer fraction new RNA

New RNA gets inferred gene-wise per treatment condition using a Bayesian Markov chain Monte Carlo (MCMC) inference engine as described in publication link missing.

A Snakemake pipeline for inferring fraction of new RNA is provided under new_RNA_inference and uses as input a gene- and condition tagged BAM-file (as for example generated by zUMIs). The pipeline tries to speed up the computationally heavy MCMC inference by utilizing available GPUs. If no GPUs are found, all computations will run on the CPU.

Code and examples of how to run the inference pipeline can be found in the new_RNA_inference folder.

2) Analyze differentially expressed genes

This repository contains a stand-alone script for analyzing differential expressed genes. Testing for differentially expressed genes is performed using a t-test with variance adjustment to correct for artifically low between-sample variance caused by low replicate numbers.

Code and examples of how to analyze differentially expressed genes can be found in the differential_expression folder.

3) Transcription factor binding enrichment

Enrichment or depletion of transcription factor binding in promoters of differentially expressed genes can provide insights into the biological processes that coordinate the transcriptional response.

Code, TF enrichment data, and examples of how to analyze enrichment of transcription factor binding can be found in the tf_enrichment folder.

About

Code and analysis scripts for analyzing newly transcribed RNA in large-scale compound screen experiments

License:GNU General Public License v3.0


Languages

Language:Python 100.0%