Normalization - Log transformation
semer94 opened this issue · comments
I am dealing with a lipidomics dataset extracted from MS-DIAL that consists of peak area data that has been normalized using LOESS algorithm. While several lipids showed significant results in univariate analysis from MS-DIAL I cannot reproduce these results. I would like to ask :
1)which variation of T-test is performed and which method is used to adjust P-values in function de_analysis( )
2)which one is considered the reference group de_analysis(lpd, vitE - vitE_SPL, measure = "Area", group_col = "Group")
here 3)what is the base of logFC obtained in the results (I assumed e)
4)if you have any suggestions on modifying the data , e.g. log transformation or some other type of normalization
5)how do the functions set_logged
and set_normalized
work, i.e. what values does the argument "val" need
With respect
Hi @semer94,
Thanks for submitting your questions as an issue.
- lipidr uses
limma
moderated t-test, which is very popular in gene expression analysis. The data should be a) normally distributed and b) normalised.
Raw peak areas from MS needs to be log-transformed to make them normally distributed. Normalisation can be done with various methods as you wish, and each has their own requirements / assumptions.
Depending on your input data, you can skip some of these steps. Log-transformation is not needed if the data already scaled, pre-logged, or otherwise follow a normal distribution. Similarly, you don't need to re-normalise your data if that has been already done.
So in your case: I assume you export a numerical matrix from MS-DIAL then:
- You used as_lipidomics_experiment() to import them into lipidr. You can set
logged = TRUE / FALSE, normalized = TRUE / FALSE
as appropriate. - If the data is not normalised, you can use normalize_pqn().
- Nothing is preventing you from using other normalisation methods. An example below:
# log the data is not logged
# Skip if already logged!
assay(d, "Area") <- log2(assay(d, "Area"))
set_logged(d, "Area", TRUE)
assay(d, "Area") <- limma::normalizeCyclicLoess(assay(d, "Area"))
set_normalized(d, "Area", TRUE)
Note the use of set_logged
and set_normalized
to indicate that the "Area"
is now logged and normalised. Also, LOESS-based normalisation generally requires normal distribution (so needs to be pre-logged).
- Data now should be ready for
de_analysis
- The general convention is
de_analysis(treatment - control)
(treatment minus control), since you're usually interested in changes in the treated group compared to control. Subtracting the control accomplishes this. - The
logFC
is the (roughly) difference between group means (mean abundance in treatment - mean abundance in control). Since the data is in the log-space, it's called log-fold change. - Answered above. In general I trust the moderated t-test since they are proven to be more robust. Obviously, nothing supersedes validated results.
- Answered above.
Hope this helps. Let me know if you have other questions. Otherwise feel free to close the issue.
Thank you for your immediate response. Another question that occured is why log2 transform and not log transform? I mean since the results are logFC and not log2FC. So if I want to calculate fold change , is this done by exp(logFC)
? Finally , a question regarding lipid names, how should SM 16:1;O2/24:1 and SM 18:2;O2/22:0 be renamed in order to be parsed by lipidr ?