Different data processing on metabolomics, I get different R2.

Question

Different data processing on metabolomics, I get different R2.

Chenjiani1112 opened this issue 4 years ago · comments

Hi.
I have three multi-omics datasets of RNA seq (vst normalization), DNA methylation (beta value) and plasma metabolomics.
I normalized my metabolite data with the total sum of all detected ions and deleted unstable metabolite using QC, and deleted the outliers based on these retrained metabolites using IQR, then I normalized samples by median and normalized these plasma metabolite using pareto scaling.
Finally, I used my RNA seq, DNA methylation and plasma metabolites as input data to run MOFA.
Howerver, the results showed that all latent factors can explain about 0% variance in plasma metabolomics.
Then, I transformed my plasma mteabolite data using log transform and normalized by pareto scaling. This MOFA result（ plasma metabolites with log）showed a dramatic difference compared with the prior MOFA resul t( plasma metabolites without log transform), that is all latent factors can explain about 10% variance in plasma metabolomics.

I am confused about the data input on metabolomics.
Thanks.

Ricard Argelaguet · Answer 1 · Thu Oct 29 2020 19:15:31 GMT+0800 (China Standard Time)

Hi @Chenjiani1112 ,
you have to use the log transformed values for the plasma metabolites. MOFA needs the data to be normal-ish distributed.

P.S. This mofa version is depreciated. Please move to MOFA v2 (https://biofam.github.io/MOFA2/)

CarrieJianiChen · Answer 2 · Thu Oct 29 2020 19:26:03 GMT+0800 (China Standard Time)

Hi @Chenjiani1112 ,
you have to use the log transformed values for the plasma metabolites. MOFA needs the data to be normal-ish distributed.

P.S. This mofa version is depreciated. Please move to MOFA v2 (https://biofam.github.io/MOFA2/)

Thanks for your help!

CarrieJianiChen · Answer 3 · Sat Nov 14 2020 13:05:28 GMT+0800 (China Standard Time)

Hi @Chenjiani1112 ,
you have to use the log transformed values for the plasma metabolites. MOFA needs the data to be normal-ish distributed.

P.S. This mofa version is depreciated. Please move to MOFA v2 (https://biofam.github.io/MOFA2/)

Hi. Thanks for sovling my doubts. Now, I have another problem. When I transformed my metabolomics data by log transform, a number of data <0 were produced. I think this situation would exert great influence on my MOFA result.

Thanks

Nicolas Vallet · Answer 4 · Sat Nov 14 2020 21:59:03 GMT+0800 (China Standard Time)

Hi @Chenjiani1112,
This may be related to a values between 0 and 1. If this is the case then you may want to normalize with an other transformation or you should modify the values between 0 and 1 depending on what is the original distribution of your data (eg. defining the minimum as 1)

CarrieJianiChen · Answer 5 · Sat Nov 14 2020 22:03:29 GMT+0800 (China Standard Time)

Hi

Hi @Chenjiani1112,
This may be related to a values between 0 and 1. If this is the case then you may want to normalize with an other transformation or you should modify the values between 0 and 1 depending on what is the original distribution of your data (eg. defining the minimum as 1)

Thanks!

CarrieJianiChen · Answer 6 · Mon Nov 30 2020 12:38:07 GMT+0800 (China Standard Time)

Hi @Chenjiani1112 ,
you have to use the log transformed values for the plasma metabolites. MOFA needs the data to be normal-ish distributed.

P.S. This mofa version is depreciated. Please move to MOFA v2 (https://biofam.github.io/MOFA2/)

Hi. @rargelaguet
Thanks for helping me resolve my prior confusions. I have appreciated your published article about MOFA and your MOFA-related documents/tuorials. However, now I have another doubt when running MOFA. As I mentioned earlier, I have three multi-omics datasets of RNA-seq, DNA methylation and plasma metabolomics, I know you used vst data for RNA-seq data and M value for DNA methylation. Now, I want to use log2FPKM data for RNA-seq data; beta value data for DNA methylation; quantile normed, log2 transformed and pareto scaling data for plasma metabolomics. due to my research design. I want to know can I use log2FPKM for RNA-seq data as input data to run MOFA? This is my confusion. Meanwhile, I found that log normalised RNA-seq data or M-values of bulk methylation data was recommended in your MOFA tuorials.

Looking forward to your reply.
Thanks!

Best,
Chen.

Ricard Argelaguet · Answer 7 · Mon Nov 30 2020 15:30:11 GMT+0800 (China Standard Time)

Hi Chen,
the important requirement for MOFA is that the data needs to be continuous. Also, the closer it looks to a gaussian distribution the better, but this is not necessary. Can you attach here a histogram of your matrices before and after normalisation? Then it will be easier to provide guidance