cit-bioinfo / BLCAsubtyping

Transcriptomic tools to classify bladder tumours according to six published molecular classifications : Baylor, UNC, MDA, Lund, CIT-Curie, TCGA

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Normalization and package usage

Bart-Joosten opened this issue · comments

Hi,

Thanks for creating this package.
Its not yet clear to me whether I should first normalise the raw counts and then log transform or first log transform and then normalise the counts.

Also, I receive the following error with counts I first normalised and then log transformed.

cl<-classify(expMat = TPM_normalized_then_log_transformed_TUR_counts,classification.systems = "TCGA")
predicting TCGA subtypes...Error in colnames(tcga.centroids)[-c(1, 2)][apply(cor(Exp[G, ], tcga.centroids[G,  : 
  invalid subscript type 'list'

My expMat object is a matrix. I tried to replace column names with unlisted column names but this does not work. My matrix looks something like this:
Screenshot 2023-09-27 at 16 19 54

However, I notice that this looks different from the CIT dataset
Screenshot 2023-09-27 at 16 20 30

For the normalization: counts should be library and fragment-size normalized first (e.g. TPM) and then log-transformed.

About the error message, I'm not sure I can reproduce it. Do you still get it if you transform into data.frame first? i.e.

cl<-classify(expMat = as.data.frame(TPM_normalized_then_log_transformed_TUR_counts), classification.systems = "TCGA")

Hi, thanks for your response.
The issue was that I had -inf values in my expMat object (because my TPM counts were sometimes zero). I replaced the -Inf values in my TPM_normalized_then_log_transformed_TUR_counts with 0. Then the function ran without issues.

Hi just to be clear: deseq and edgeR normalization are not fragment size normalized correct? But would FKPM work?