bhklab / ORCESTRA

ORCESTRA is a new web application that enables users to search, request and manage pharmacogenomic datasets (PSets).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problem with gene symbols in CCLE (and possibly other PSets)

bhaibeka opened this issue · comments

ss1 <- fNames(CCLE, mDataType = "mutation")
ss2 <- featureInfo(CCLE, mDataType = "mutation")[ , "Symbol",drop=TRUE]

table(is.na(ss1) == is.na(ss2))

FALSE TRUE
83 1584

These 2 vectors should be the same

Hi Ben,

The Symbol column is not one of our standardized column names and therefore is not checked when creating a PharmacoSet. The fNames function instead looks at the row.names of the featureInfo data.frame, which should correspond to the standardized geneid column therein.

However, there is no geneid column in the mutation data for CCLE. Indeed, our standardized columns are missing from many of our PharmacoSet objects. This is an annotation issue, and not necessarily something we can fix in PharmacoGx.

Thus I am moving this issue to the ORCESTRA repo, where we can update the CCLE script to fix the annotation problems.

Best,
Chris

To clarify, there are a number of NA values in featureInfo(CCLE, "mutation")$Symbol which are not present in the row.names of that object. Additionally, the standardized geneid column is absent from the SummarizedExperiment, which prevents this issue from being caught during PharmacoSet creation.