This repository contains some scripts to assess different methods of choosing the number of PCs to retain.
The text
directory contains LaTeX files for the report, a compiled PDF of which can be found here.
The simulations
directory contains R scripts for performing the basic simulations:
functions.R
, a central R script containing definitions of useful functions for the simulations.sim_gaussclust.R
, a template for simulations of clusters with Gaussian noise.sim_trajectory.R
, a tempalte for simulations of trajectories between multiple nodes.submitter.sh
, a Bash script for SLURM job submission of the simulations.plot_results.R
, an R script to generate the plots.simulate_noise.R
, an R script examining the effect of removing biological noise.
The real
directory contains R scripts for performing the real data-based simulations:
proc_kolod.R
, an R script for pre-processing the mESC data set.proc_pbmc4k.R
, an R script for pre-processing the PBMC data set.run_kolod.R
, a template for performing simulations based on the mESC data set.run_pbmc4k.R
, a template for performing simulations based on the PBMC data set.submitter.sh
, a Bash script for SLURM job submission of the simulations.plot_results.R
, an R script to generate the plots.
In addition, batching/batching.Rmd
contains an example of how batch removal in the presence of zeroes can distort the PCA results.