For starting from scratch with fastq files, an experiment ID or set of SRA's you would like to process into a counts gene matrix file.
A) For Smart-seq2 data or other single cell data that is available on SRA and not droplet or tag-based.
For starting from counts gene matrix data file that you would like to normalize and do further analyses on.
For either pipeline, first clone the repository and then set up the docker image to work from: Assuming you have docker installed already on your computer. Follow these steps to run this. Open up command line (Terminal or what have you).
$ docker build -< Dockerfile -t <DESIRED_IMAGE_TAG_HERE>
$ docker run -it --rm --mount type=volume,dst=/home/rstudio/kitematic,volume-driver=local,volume-opt=type=none,volume-opt=o=bind,volume-opt=device=<PUT_DESIRED_LOCAL_DIRECTORY_PATH_HERE> -e PASSWORD=<DESIRED_PASSWORD_HERE> -p 8787:8787 <SAME_DESIRED_IMAGE_TAG_AS_ABOVE_HERE>
Run the first line so you can find out what your container id is.
$ docker ps
It will be something like "a1b23c45" (a jumble of lower case letters and numbers). And you'll put that here:
$ docker exec -it <CONAINER_ID> bash
For starting from scratch with an experiment ID or set of SRA's you would like to process into a counts gene matrix file.
A) For Smart-seq2 data or other single cell data that is available on SRA and not droplet or tag-based.
are working with OR keep this the same and follow this example's dataset.
# Change your directory name, GEO ID, and SRP here. Then run the script.
dir=darmanis_data
GSE=GSE84465
SRP=SRP079058
label=darmanis
Depending on how many samples are in the dataset this will take an hour or days (if you have thousands of samples)
To open Rstudio in docker, go to your internet browser and enter: localhost:8787
Follow the example in darmanis_data_prep.Rmd
to set up data.
$ cd <PATH_TO_THE_CLONED_REPOSITORY>
$ bash run_pre-processing_pipeline.sh
Step B1. Open run_tag-based_pre-processing_pipeline.sh
and change the url and variables to the dataset you
are working with OR keep this the same and follow this example's dataset.
# Change your directory name, and label here.
dir=pbmc_data
label=pbmc_1k_v2
Change this line to the url of the fastq files for the dataset you want to work with. Or keep as is and follow the example
cd ${dir}
curl -O http://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_1k_v2/pbmc_1k_v2_fastqs.tar
tar -xvf ${label}.tar
Depending on how many samples are in the dataset this will take a few hours or so.
For starting from counts gene matrix data file that you would like to normalize and do further analyses on.
To open Rstudio in docker, go to your internet browser and enter: localhost:8787
Follow the example in darmanis_data_prep.Rmd
to set up data.
are working with OR keep this the same and follow this example's dataset.
dir=darmanis_data
label=darmanis
$ cd <PATH_TO_THE_CLONED_REPOSITORY>
$ bash run_post-processing_pipeline.sh