Normalization and package usage
Bart-Joosten opened this issue · comments
Hi,
Thanks for creating this package.
Its not yet clear to me whether I should first normalise the raw counts and then log transform or first log transform and then normalise the counts.
Also, I receive the following error with counts I first normalised and then log transformed.
cl<-classify(expMat = TPM_normalized_then_log_transformed_TUR_counts,classification.systems = "TCGA")
predicting TCGA subtypes...Error in colnames(tcga.centroids)[-c(1, 2)][apply(cor(Exp[G, ], tcga.centroids[G, :
invalid subscript type 'list'
My expMat object is a matrix. I tried to replace column names with unlisted column names but this does not work. My matrix looks something like this:
However, I notice that this looks different from the CIT dataset
For the normalization: counts should be library and fragment-size normalized first (e.g. TPM) and then log-transformed.
About the error message, I'm not sure I can reproduce it. Do you still get it if you transform into data.frame first? i.e.
cl<-classify(expMat = as.data.frame(TPM_normalized_then_log_transformed_TUR_counts), classification.systems = "TCGA")
Hi, thanks for your response.
The issue was that I had -inf values in my expMat object (because my TPM counts were sometimes zero). I replaced the -Inf values in my TPM_normalized_then_log_transformed_TUR_counts with 0. Then the function ran without issues.
Hi just to be clear: deseq and edgeR normalization are not fragment size normalized correct? But would FKPM work?