Problem with gene symbols in CCLE (and possibly other PSets)
bhaibeka opened this issue · comments
ss1 <- fNames(CCLE, mDataType = "mutation")
ss2 <- featureInfo(CCLE, mDataType = "mutation")[ , "Symbol",drop=TRUE]
table(is.na(ss1) == is.na(ss2))
FALSE TRUE
83 1584
These 2 vectors should be the same
Hi Ben,
The Symbol
column is not one of our standardized column names and therefore is not checked when creating a PharmacoSet
. The fNames
function instead looks at the row.names
of the featureInfo
data.frame, which should correspond to the standardized geneid
column therein.
However, there is no geneid
column in the mutation data for CCLE. Indeed, our standardized columns are missing from many of our PharmacoSet
objects. This is an annotation issue, and not necessarily something we can fix in PharmacoGx
.
Thus I am moving this issue to the ORCESTRA repo, where we can update the CCLE script to fix the annotation problems.
Best,
Chris
To clarify, there are a number of NA
values in featureInfo(CCLE, "mutation")$Symbol
which are not present in the row.names of that object. Additionally, the standardized geneid
column is absent from the SummarizedExperiment
, which prevents this issue from being caught during PharmacoSet
creation.