bioFAM / MOFA

Multi-Omics Factor Analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Different data processing on metabolomics, I get different R2.

Chenjiani1112 opened this issue · comments

Hi.
I have three multi-omics datasets of RNA seq (vst normalization), DNA methylation (beta value) and plasma metabolomics.
I normalized my metabolite data with the total sum of all detected ions and deleted unstable metabolite using QC, and deleted the outliers based on these retrained metabolites using IQR, then I normalized samples by median and normalized these plasma metabolite using pareto scaling.
Finally, I used my RNA seq, DNA methylation and plasma metabolites as input data to run MOFA.
Howerver, the results showed that all latent factors can explain about 0% variance in plasma metabolomics.
Then, I transformed my plasma mteabolite data using log transform and normalized by pareto scaling. This MOFA result( plasma metabolites with log)showed a dramatic difference compared with the prior MOFA resul t( plasma metabolites without log transform), that is all latent factors can explain about 10% variance in plasma metabolomics.

I am confused about the data input on metabolomics.
Thanks.

Hi @Chenjiani1112 ,
you have to use the log transformed values for the plasma metabolites. MOFA needs the data to be normal-ish distributed.

P.S. This mofa version is depreciated. Please move to MOFA v2 (https://biofam.github.io/MOFA2/)

Hi @Chenjiani1112 ,
you have to use the log transformed values for the plasma metabolites. MOFA needs the data to be normal-ish distributed.

P.S. This mofa version is depreciated. Please move to MOFA v2 (https://biofam.github.io/MOFA2/)

Hi @Chenjiani1112 ,
you have to use the log transformed values for the plasma metabolites. MOFA needs the data to be normal-ish distributed.

P.S. This mofa version is depreciated. Please move to MOFA v2 (https://biofam.github.io/MOFA2/)

Thanks for your help!

Hi @Chenjiani1112 ,
you have to use the log transformed values for the plasma metabolites. MOFA needs the data to be normal-ish distributed.

P.S. This mofa version is depreciated. Please move to MOFA v2 (https://biofam.github.io/MOFA2/)

Hi. Thanks for sovling my doubts. Now, I have another problem. When I transformed my metabolomics data by log transform, a number of data <0 were produced. I think this situation would exert great influence on my MOFA result.

Thanks

Hi @Chenjiani1112,
This may be related to a values between 0 and 1. If this is the case then you may want to normalize with an other transformation or you should modify the values between 0 and 1 depending on what is the original distribution of your data (eg. defining the minimum as 1)

Hi

Hi @Chenjiani1112,
This may be related to a values between 0 and 1. If this is the case then you may want to normalize with an other transformation or you should modify the values between 0 and 1 depending on what is the original distribution of your data (eg. defining the minimum as 1)

Thanks!

Hi @Chenjiani1112 ,
you have to use the log transformed values for the plasma metabolites. MOFA needs the data to be normal-ish distributed.

P.S. This mofa version is depreciated. Please move to MOFA v2 (https://biofam.github.io/MOFA2/)

Hi. @rargelaguet
Thanks for helping me resolve my prior confusions. I have appreciated your published article about MOFA and your MOFA-related documents/tuorials. However, now I have another doubt when running MOFA. As I mentioned earlier, I have three multi-omics datasets of RNA-seq, DNA methylation and plasma metabolomics, I know you used vst data for RNA-seq data and M value for DNA methylation. Now, I want to use log2FPKM data for RNA-seq data; beta value data for DNA methylation; quantile normed, log2 transformed and pareto scaling data for plasma metabolomics. due to my research design. I want to know can I use log2FPKM for RNA-seq data as input data to run MOFA? This is my confusion. Meanwhile, I found that log normalised RNA-seq data or M-values of bulk methylation data was recommended in your MOFA tuorials.

Looking forward to your reply.
Thanks!

Best,
Chen.

Hi Chen,
the important requirement for MOFA is that the data needs to be continuous. Also, the closer it looks to a gaussian distribution the better, but this is not necessary. Can you attach here a histogram of your matrices before and after normalisation? Then it will be easier to provide guidance