ngehlenborg / upset

CFDE programs Upset

Home Page:https://acharbonneau.github.io/upset/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

UpSet of Tissues in the Common Fund Data Ecosystem

CFDE Program data

I am hosting this instance of UpSet at https://acharbonneau.github.io/upset/. The raw data used for the plot can be viewed (or edited) here. Direct links to the pages I used to compile this dataset are provided as in-line links in the next section.

In addition to the UpSet view, I have also plotted that same data as a heatmap, overlaid with some variables describing general data types. You can view a static version of that plot here: HeatMap of UpSet data.

Data Collection

The dataset is a simple binary code:

  • 0 means 'this tissue is not found in this programs dataset'
  • 1 means 'this tissue is found in this programs dataset'

Very generally, the data was compiled by starting with the GTEx tissue list, checking each other programs data against that list, and adding tissues as needed.

More specifically:

Data Decisions

The goal of this exercise is to identify where a researcher might find different kinds of samples about their research tissue to answer a research question. For example, they may want to find all available information about a particular disease to build a hypothesis, or to determine what new type of analysis would be most likely to find the types of changes they are looking for. To facilitate that kind of question, I made the following tissue generalizations:

  • If a sample was from the diseased version of a tissue, I counted it as that tissue: For e.g. colon cancer is colon tissue, kidney cancer is kidney tissue
  • If the sample was a biofluid associated with a tissue, I counted it as that tissue. For e.g. saliva is counted as 'salivary gland', a microbiome sample of the colon is counted as 'colon'.
  • If the sample was from a cultured cell line, I counted it as it's source tissue. For e.g. cultured kidney cell is kidney tissue. EXCEPT:
    • fibroblasts and lymphocytes, which are explicity listed in the matrix
    • undifferentiated cells, dedifferentiated cells, stem cells, and metatisticized cancer cells where are all excluded
  • I assumed that skin tissue from cell culture, or furry creatures (mice, rats, etc) was not sun exposed
  • GTEx distingushes between brain tissues by region, whereas most other programs distingush by cell types that may occur variously throughout the brain. Therefore, in most cases, I can only distinguish between 'Brain', 'Spinal cord' and 'Brain stem'. So, for example, Kids First has a '1' for all brain tissues, but may or may not have samples from all of the GTEx regions. The brain tissues listed could be collapsed into 'Brain', 'Spinal cord' and 'Brain stem' without losing any data, however I have left them as is to show the variety of data available.
  • In a few other cases, GTEx has more spatial specificity than other programs. Noteably Esophagus, Adipose, Heart and Kidney. Since GTEx uses cadavers whereas most other Program samples come from living donors, I assumed that samples from other Programs came from the more accessible/common tissue when it was not specified. Therefore tissue listed as only 'Esophagus' was classified as 'Esophagus - Mucosa', 'Adipose' was assumed to be 'Adipose - Subcutaneous', and 'Heart' was assumed to be 'Heart - Left Ventricle'. In the case of samples labeled only 'Kidney', I assumed that the tissues were not dissected before analysis, and marked both 'Kidney - Cortex' and 'Kidney - Medulla'.

About

This site is based on UpSet, an interactive, web based visualization technique designed to analyze set-based data. UpSet visualizes both, set intersections and their properties, and the items (elements) in the dataset. Please see the project website at http://vcg.github.io/upset/about for details about the technique, publications and videos.

About

CFDE programs Upset

https://acharbonneau.github.io/upset/

License:MIT License


Languages

Language:JavaScript 88.8%Language:CSS 7.9%Language:HTML 3.1%Language:Python 0.1%Language:R 0.1%