Normalization - Log transformation

Question

Normalization - Log transformation

semer94 opened this issue 2 years ago · comments

I am dealing with a lipidomics dataset extracted from MS-DIAL that consists of peak area data that has been normalized using LOESS algorithm. While several lipids showed significant results in univariate analysis from MS-DIAL I cannot reproduce these results. I would like to ask :
1)which variation of T-test is performed and which method is used to adjust P-values in function de_analysis( )
2)which one is considered the reference group de_analysis(lpd, vitE - vitE_SPL, measure = "Area", group_col = "Group") here 3)what is the base of logFC obtained in the results (I assumed e)
4)if you have any suggestions on modifying the data , e.g. log transformation or some other type of normalization
5)how do the functions set_logged and set_normalized work, i.e. what values does the argument "val" need

With respect

Ahmed Mohamed · Answer 1 · Fri May 12 2023 20:04:36 GMT+0800 (China Standard Time)

Hi @semer94,

Thanks for submitting your questions as an issue.

lipidr uses limma moderated t-test, which is very popular in gene expression analysis. The data should be a) normally distributed and b) normalised.

Raw peak areas from MS needs to be log-transformed to make them normally distributed. Normalisation can be done with various methods as you wish, and each has their own requirements / assumptions.

Depending on your input data, you can skip some of these steps. Log-transformation is not needed if the data already scaled, pre-logged, or otherwise follow a normal distribution. Similarly, you don't need to re-normalise your data if that has been already done.

So in your case: I assume you export a numerical matrix from MS-DIAL then:

You used as_lipidomics_experiment() to import them into lipidr. You can set logged = TRUE / FALSE, normalized = TRUE / FALSE as appropriate.
If the data is not normalised, you can use normalize_pqn().
Nothing is preventing you from using other normalisation methods. An example below:

# log the data is not logged
# Skip if already logged!
assay(d, "Area") <- log2(assay(d, "Area"))
set_logged(d, "Area", TRUE)

assay(d, "Area") <- limma::normalizeCyclicLoess(assay(d, "Area"))
set_normalized(d, "Area", TRUE)

Note the use of set_logged and set_normalized to indicate that the "Area" is now logged and normalised. Also, LOESS-based normalisation generally requires normal distribution (so needs to be pre-logged).

Data now should be ready for de_analysis

The general convention is de_analysis(treatment - control) (treatment minus control), since you're usually interested in changes in the treated group compared to control. Subtracting the control accomplishes this.
The logFC is the (roughly) difference between group means (mean abundance in treatment - mean abundance in control). Since the data is in the log-space, it's called log-fold change.
Answered above. In general I trust the moderated t-test since they are proven to be more robust. Obviously, nothing supersedes validated results.
Answered above.

Hope this helps. Let me know if you have other questions. Otherwise feel free to close the issue.

semer94 · Answer 2 · Fri May 12 2023 21:21:33 GMT+0800 (China Standard Time)

Thank you for your immediate response. Another question that occured is why log2 transform and not log transform? I mean since the results are logFC and not log2FC. So if I want to calculate fold change , is this done by exp(logFC) ? Finally , a question regarding lipid names, how should SM 16:1;O2/24:1 and SM 18:2;O2/22:0 be renamed in order to be parsed by lipidr ?