Installation / R errors?
phdegnan opened this issue · comments
Fang Lab,
Unfortunately the singularity installation option wasn't working for for our campus cluster or my lab machines. As such, I had to set up a conda environment on our cluster and attemptted to leverage available modules with compatible software versions of bwa, R, etc. After fixing file paths in your scripts things started to look like they were working. However, using your test data sets for both E. coli and the metagenome I'm hitting two different R errors:
$ nanodisco characterize -p 4 -b Ecoli -d dataset/EC_difference.RDS -o analysis/Ecoli_motifs -m GATC,CCWGG,GCACNNNNNNGTT -t nn -r reference/Ecoli_K12_MG1655_ATCC47076.fasta
[2022-06-13 12:34:39] Load supplied current differences.
[2022-06-13 12:34:46] Check current differences file version.
[2022-06-13 12:34:46] Determine motif signature center.
[2022-06-13 12:34:46] Process GATC.
[2022-06-13 12:34:46] Tag GATC occurrences.
[2022-06-13 12:34:55] Score GATC modified position.
[2022-06-13 12:34:59] Process CCWGG.
[2022-06-13 12:34:59] Tag CCWGG occurrences.
[2022-06-13 12:35:06] Score CCWGG modified position.
[2022-06-13 12:35:09] Process GCACNNNNNNGTT.
[2022-06-13 12:35:09] Tag GCACNNNNNNGTT occurrences.
[2022-06-13 12:35:15] Score GCACNNNNNNGTT modified position.
Error in { : task 1 failed - "Invalid unit"
Calls: find.signature.center -> %do% ->
Execution halted
It doesn't matter how many MOTIFs input (1,3, all 4 tested). It fails after the last one.
Is there a way to add verbose reporting to R within the context of your code? It seems like the error is stemming somewhere in characterize.R ln 63 referring to analysis_functions.R find.signature.center function on ln 2179.
After this failure, I then tried the metagenome example. The first two commands ran without error. The third errorred out:
$ nanodisco plot_binning -r reference/metagenome.fasta -u analysis/binning/methylation_binning_MGM1_motif.RDS -b MGM1_motif -o analysis/binning -a reference/motif_binning_annotation.RDS --MGEs_file dataset/list_MGE_contigs.txt
[2022-06-13 14:23:15] Prepare default metagenome annotation.
[2022-06-13 14:23:16] Load additional annotation.
[2022-06-13 14:23:17] Plot binning.
Error in unit(unclass(x), attr(x, "unit"), attr(x, "data")) :
Invalid unit
Calls: plot.tsne.motifs.score ... convertUnit -> upgradeUnit -> upgradeUnit.unit -> unit
Execution halted
Given your intimate familiarity with your code - any suggestions you have would be most welcome.
My best guess is that is an R package issue? Maybe? Since I was using a system install of R 4.1.2, there were some that I couldn't overwrite/update. What version of R would you recommend if I am installing it fresh within the conda environment?
Regards,
Patrick
Thank you for your interest, Patrick. Alan will help when he gets a chance. Just wanted to add that: one of reasons we put the package in Singularity was to avoid errors caused due to R/package versions, which happened to ourselves. So, for long term, it is likely a good idea to still try to get Singularity setup (not sure why it didn't work on your cluster or lab machine, but likely fixable), because new errors can occur in new R/package versions.
Oh, I get the rationale for singularity - code dependencies are a pain in the butt. However, my lab Mac was unable to install Virtualbox (the prereq for singularity) And our campus cluster's sys admin doesn't have time to install it ATM and I don't have time to wait. Looking forward to Alan sharing his insight.
Hey, @phdegnan, have you tried installing the specific package versions manually yourself?
Disclaimer: I am not Alan (nor am I affiliated with the project in any way), but I am in the process of using it for a project myself.
You can see the exact versions you need in the post-installation script. At that point, you can try any of the methods mentioned in this StackOverflow post and see if that works.
At the end of the day, as long as you can install the right package versions somehow (meaning you either build them from source or install pre-compiled binary versions), the program should run.1
It's janky, but this solution not only circumvents the VirtualBox installation problem entirely, the analysis will even execute faster because it's not being run indirectly via a hypervisor.
Hope this helps,
Jose
Footnotes
-
Of course, this is only true in theory. In practice, theory and practice can differ greatly. ↩
Hello @phdegnan,
Sorry for the late answer but I think it's worth noting that I was able to install and use singularity
with OSX (Mac) during the development of nanodisco
. I've used Singularity-Desktop
beta version but it seems to be discontinued (here). They also offer a solution using docker compose
but I've not tried it (here).
In your situation, with access to a cluster, I would try to install singularity with conda. You can try running this old version:
conda create --name singularity -c conda-forge singularity
conda activate singularity
singularity --version
# singularity version 3.8.6
Although they are known issue (this), they seem to be fixed in latest singularity
version. Importantly, singularity
development organisation recently changed, and the tool is now maintained under the apptainer
name. This should be fully backward compatible. I've successfully run nanodisco
commands from your first message with the following installation of apptainer
:
conda create --name apptainer -c conda-forge apptainer
conda activate apptainer
apptainer --version
# apptainer version 1.1.5
Lastly, apptainer
seems to be usable with MacOS but I didn't test: here.
Best,
Alan