AlexsLemonade / compendium-processing

A series of analyses related to refine.bio species compendia

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Exploring refine.bio species compendia

Once refine.bio reaches production, we will periodically release compendia comprised of all the samples from a species that we were able to process. We refer to these as species compendia and envision that these collections will be useful for extracting features from a diverse set of biological conditions. Creating a species compendia pipeline required us to tackle various problems such as selecting a method for imputing missing values. This repository holds a series of analyses related to refine.bio species compendia divided up into related modules. See the README files in the individual directories for more information.

Modules

  • select_imputation_method - A series of experiments/evaluations for selecting a method for imputing missing values.
  • human_missingness - Typically genes that are measured in less than 30% of samples are removed before imputing missing values in gene expression data. How many genes would be left in the human compendium using this cutoff?
  • impute_requirements - How long does it take to run KNN impute? (Too long for our use case.)
  • quality_check - Exploring test zebrafish compendia.

About

A series of analyses related to refine.bio species compendia

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:HTML 99.6%Language:R 0.2%Language:Jupyter Notebook 0.1%Language:Shell 0.1%Language:Python 0.0%Language:Dockerfile 0.0%