cit-bioinfo / consensusMIBC

Transcriptomic classifier for Muscle-Invasive Bladder Cancer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Format of input matrix

sdwien opened this issue · comments

I am trying consensusMIBC for the first time, and it works fine with the example data, but not with my own data. My input matrix looks like this:

head(input)           
             ENSG  MH01001 MH01002  MH01003  MH01004   MH01005   MH01006
1 ENSG00000223972 0.000000       0 0.000000 0.000000 0.0000000 0.0000000
2 ENSG00000227232 1.757685       0 2.744584 2.389094 0.0000000 0.7287099
3 ENSG00000278267 0.000000       0 2.090266 1.824959 0.7022326 0.0000000
4 ENSG00000243485 0.000000       0 0.000000 0.000000 0.0000000 0.0000000
5 ENSG00000284332 0.000000       0 0.000000 0.000000 0.0000000 0.0000000
6 ENSG00000237613 0.000000       0 0.000000 0.000000 0.0000000 0.0000000

My command was:

consensusresults <- getConsensusClass(input,gene_id="ensembl_gene_id")

And I am getting this error:

Error in getConsensusClass(input, gene_id = "ensembl_gene_id") : 
  Empty intersection between profiled genes and the genes used for consensus classification.
 Make sure that gene names correspond to the type of identifiers specified by the gene_id argument

How does the input format have to be? Does the first column have to have a specific name (here ENSG)?
Many thanks for your suggestions.
Best, Sophia

I found out that for some genes, because I worked with the ENSG's with version number (such as ENSG00000223972.5), after removing the version number, I had duplicate rows for the same ENSG. Apparently, that was the problem. After unique-ing those genes, it worked for me.
Best, Sophia