bioFAM / MOFA

Multi-Omics Factor Analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Truncated feature names after runMOFA

fanli-gcb opened this issue · comments

I noticed that somewhere in the runMOFA function, the feature names are getting truncated and thereby creating non-unique names that breaks downstream code.

As an example, here are two of the original features:

> rownames(MOFAobject@TrainData[["plasma"]])[779:780]
[1] "[plasma] sulfate of piperine metabolite C16H19NO3 (2)*"
[2] "[plasma] sulfate of piperine metabolite C16H19NO3 (3)*"

After running runMOFA:

> MOFAobject2 <- loadModel(modelFile, MOFAobject) # loading results from runMOFA
> rownames(MOFAobject2@TrainData[["plasma"]])[779:780]
[1] "[plasma] sulfate of piperine metabolite C16H19NO3 "
[2] "[plasma] sulfate of piperine metabolite C16H19NO3 "

Any ideas on where this truncation is happening? I have narrowed it down to the runMOFA call, but not sure where within that function.

Thanks in advance for any help!

That is indeed the case.
The problem is saving the sample names to the hdf5 file. In the current HDF5 version, the strings are restricted to 50 characters. I couldn't find a way around it.

There should be a warning in prepareMOFA:
if (any(nchar(sampleNames(object))>50)) warning("Due to string size limitations in the HDF5 format, sample names will be trimmed to less than 50 characters")

However, there is a simple solution. Just edit the sampleNames manually after loading the model:
sampleNames(object) <- sample_names
make sure that the order is consistent

Thanks for the help! Here's the code I used for the workaround in case it's useful for anyone else (notice it is on featureNames instead of sampleNames):

featurenames <- MOFA::featureNames(MOFAobject) # prior to runMOFA
MOFA::featureNames(MOFAobject) <- featurenames[names(MOFA::featureNames(MOFAobject))] # after runMOFA or loadModel