AlexsLemonade / refinebio-examples

Example workflows for refine.bio data

Home Page:https://www.refine.bio

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Finalize pathway analyses text, including introductory paragraphs

cansavvy opened this issue · comments

Background

Based on implementing option 2: #340 (comment), we need to draft up a paragraph with the essential info of how someone decides which pathway analysis they should use.

Problem

Currently the pathway analyses don't have background info about how to choose a pathway analysis based on your question and dataset. The original plan was to have an intro doc, but we discussed it on #340 and each pathway analysis will have its own brief intro info paragraph.

What are the recommended next steps?

The info will include the same kinds of concepts as are covered in the training slides but of course tailored for a self-teaching context as well as being more brief: https://docs.google.com/presentation/d/1WPuuN6KviEswaWVU0yBRe4XvFjPE0mT3Epd1Xr6TjMQ/edit#slide=id.p
Basically give the users guidance about which analysis they will want to use based on their questions and dataset's contents and link out to those other pathway analyses.

  1. Post a outline here. We should try to make it as brief as possible and link out to helpful sources.
  2. Draft a PR but only implement the paragraph in one of the analyses so the reviewer and author don't have to redundantly review and edit in each pathway analysis
  3. In a second PR put that same paragraph in the other pathway analyses.

Is there a particular timeframe for this issue?

Before going live probably.

Here's the outline I'm thinking:

  • About this pathway analysis
    • Contains brief example-specific description
  • What is pathway analysis?
    • General info on what pathway analyses are
    • Link to Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges
  • How to choose a pathway analysis?
    • Table that summarizes the current pathway analysis examples with these columns:
      • Analysis name (with link)
      • What is required for input
      • What output looks like
      • ✅ Pros
      • ⚠️ Cons

Unfortunately the How to choose a pathway analysis? will require updates whenever a new example is added. But the rest shouldn't have to be changed too much.

Here's a general draft. If it looks decent, I will file a PR for the more detailed feedback.

About this pathway analysis

This example is one of the pathway analysis module set.

<-- Example specific explanation -->

What is pathway analysis?

We refer to any technique that uses predetermined sets of genes that are related or coordinated in their expression in some way (e.g., participate in the same molecular process, are regulated by the same transcription factor) to interpret a high-throughput experiment as pathway analysis.
In the context of refine.bio, we use these techniques to analyze and interpret genome-wide gene expression experiments.
The rationale for performing pathway analysis is that looking at the pathway-level may be more biologically meaningful than considering individual genes, especially if a large number of genes are differentially expressed between conditions of interest.
In addition, many relatively small changes in the expression values of genes in the same pathway could lead to a phenotypic outcome and these small changes may go undetected in differential gene expression analysis.

We highly recommend taking a look at Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges from Khatri et al. for a more comprehensive overview and reading the primary publications and documentation of the methods and sources we will introduce below.

How to choose a pathway analysis?

This table summarizes the pathway analyses examples in this module.

Analysis What is required for input What output looks like ✅ Pros ⚠️ Cons
ORA (Over-representation Analysis) A list of gene IDs (no stats needed) A per-pathway hypergeometric test result
  • Simple
  • Computationally inexpensive to compute p-values
  • Requires arbitrary thresholds and ignores any statistics associated with a gene
  • Assumes independence of genes and pathways
GSEA (Gene Set Enrichment Analysis) A list of genes IDs with gene-level summary statistics A per-pathway enrichment score
  • Includes all genes (no arbitrary threshold!)
  • Attempts to measure coordination of genes
  • Permutations can be expensive
  • Does not account for pathway overlap
  • Two-group comparisons not always appropriate/feasible
GSVA (Gene Set Variation Analysis) A gene expression matrix (like what you get from refine.bio directly) Pathway-level scores on a per-sample basis
  • Does not require two groups to compare upfront
  • Normally distributed scores
  • Scores are not a good fit for gene sets that contain genes that go up AND down
  • Method doesn’t assign statistical significance itself
  • Recommended sample size n > 10

@jaclyn-taroni edit to remove inadvertent tag.

This intro paragraph is in the ORA example with #349 but needs to be added to GSEA and GSVA examples: #354 and #371

Apparently Closes <issue 1> and <issue 2> doesn't close <issue 2>! This was closed via #441.