bioFAM / MOFA

Multi-Omics Factor Analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training and validation set

anso-sertier opened this issue · comments

Hi,

Is it possible to train a model with training samples and then use the model on another set of samples to validate the LFs obtained ?
I don't know if i'm clear, but I would like to compute the Z matrices (samples against LF) for samples not used to train the model. I have the Y matrices for the new samples.

Thanks a lot in advance

Anne-Sophie Sertier

Hello,
apologies for the slow reply.

Mathematically it is as simple as rearranging the master equation from Y = WZ to Z = inv(W)Y. If you can calculate the inverse of W, then you can project new samples to the latent space.
However, W is not a square matrix, so you would have to compute a pseudoinverse of some sort

I guess you are testing generalisation/overfitting capacities. Due to all the sparsity priors and its linear unsupervised nature, the model is very unlikely to overfit. But if you want to test this, a good approach could be out of sample prediction by masking values at random (using all samples). Alternatively, you could test how much variance the model explains or how many factors it recovers after downsampling the dataset.

I hope this was useful, let me know if you have more questions.