Normalization and package usage

Question

Normalization and package usage

Bart-Joosten opened this issue a year ago · comments

Hi,

Thanks for creating this package.
Its not yet clear to me whether I should first normalise the raw counts and then log transform or first log transform and then normalise the counts.

Also, I receive the following error with counts I first normalised and then log transformed.

cl<-classify(expMat = TPM_normalized_then_log_transformed_TUR_counts,classification.systems = "TCGA")
predicting TCGA subtypes...Error in colnames(tcga.centroids)[-c(1, 2)][apply(cor(Exp[G, ], tcga.centroids[G,  : 
  invalid subscript type 'list'

My expMat object is a matrix. I tried to replace column names with unlisted column names but this does not work. My matrix looks something like this:

However, I notice that this looks different from the CIT dataset

Clarice Groeneveld · Answer 1 · Wed Sep 27 2023 22:36:38 GMT+0800 (China Standard Time)

For the normalization: counts should be library and fragment-size normalized first (e.g. TPM) and then log-transformed.

About the error message, I'm not sure I can reproduce it. Do you still get it if you transform into data.frame first? i.e.

cl<-classify(expMat = as.data.frame(TPM_normalized_then_log_transformed_TUR_counts), classification.systems = "TCGA")

Bart-Joosten · Answer 2 · Wed Sep 27 2023 23:27:03 GMT+0800 (China Standard Time)

Hi, thanks for your response.
The issue was that I had -inf values in my expMat object (because my TPM counts were sometimes zero). I replaced the -Inf values in my TPM_normalized_then_log_transformed_TUR_counts with 0. Then the function ran without issues.

Bart-Joosten · Answer 3 · Wed Oct 11 2023 15:47:00 GMT+0800 (China Standard Time)

Hi just to be clear: deseq and edgeR normalization are not fragment size normalized correct? But would FKPM work?