AlexsLemonade / refinebio-examples

Example workflows for refine.bio data

Home Page:https://www.refine.bio

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RNA-seq examples updates: Move filtering before DESeqDataSet creation

cansavvy opened this issue · comments

Background

If we do counts filtering before we make a DESeqDataSet we can use tidyverse and its more clear.

See #416

Problem

You can make this filtering way more clear and tidyverse if we do the filtering before we make the DESeqDataset creation.
Filtering the ddset itself is not super good looking or clear. @jashapiro mentioned something about this somewhere but I don't think I ever made it into an issue.

What are the recommended next steps?

Here's what that step can look like if we move it to right before we do a DESeq2 thing:

# Define a minimum counts cutoff and filter the data to include
# only rows (genes) that have total counts above the cutoff
filtered_expression_df <- expression_df %>%
  dplyr::filter(rowSums(.) >= 10)

This is relevant to all the places we do a counts filter in RNA-seq.

From a quick Define a minimum counts cutoff search in project, it looks like these are the examples that could use this change:

  • clustering_rnaseq_01_heatmap
  • differential-expression_rnaseq_01.html
  • dimension-reduction_rnaseq_01_pca.html
  • dimension-reduction_rnaseq_02_umap.html