JGASmits / AnanseSeurat

Single cell ANANSE Gene-regulatory-network analysis from Seurat objects

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Activating conda environment

sylestiel opened this issue · comments

@JGASmits @Rebecza @Arts-of-coding
How to determine if the program is running or if it has stalled, "Conda activate" hangs.

Is there any way to check for it without interrupting the run if it is actually active?

Activating conda environment: .snakemake/conda/3f88efe941f72bcdb4d5867b0d6db92f_
Activating conda environment: .snakemake/conda/3f88efe941f72bcdb4d5867b0d6db92f_
[Fri Sep 8 16:50:32 2023]
Finished job 5.
1 of 24 steps (4%) done
Select jobs to execute...

[Fri Sep 8 16:50:32 2023]
rule pfmscorefile:
input: /Users/pediatrics/Desktop/R_projects/scANANSE/analysis/Peak_Counts.tsv, /Users/pediatrics/Desktop/R_projects/scANANSE/analysis/gimme/mm10.gimme.vertebrate.v5.0.pfm, /Users/pediatrics/Desktop/R_projects/scANANSE/data/mm10
output: /Users/pediatrics/Desktop/R_projects/scANANSE/analysis/gimme/pfmscorefile.tsv
log: /Users/pediatrics/Desktop/R_projects/scANANSE/analysis/gimme/log_mm10_pfmscorefile.txt
jobid: 6
reason: Missing output files: /Users/pediatrics/Desktop/R_projects/scANANSE/analysis/gimme/pfmscorefile.tsv; Input files updated by another job: /Users/pediatrics/Desktop/R_projects/scANANSE/analysis/gimme/mm10.gimme.vertebrate.v5.0.pfm
threads: 12
resources: tmpdir=/var/folders/2c/zzjsgs_53vqflzjl28hf1x7r0000gn/T

Activating conda environment: .snakemake/conda/3f88efe941f72bcdb4d5867b0d6db92f_

Could use some suggestions.
Thanks!

Good afternoon @sylestiel,

There a number of options to check this without disrupting the run.

After or during certain steps

From your current/working directory of anansnake, you can navigate to the folder: /.snakemake/log/
In this folder the log file will contain the processes logged as shown on the command line after the anansnake steps have been completed.

As an alternative, you can move to your tmpdir, in your case "/var/folders/2c/zzjsgs_53vqflzjl28hf1x7r0000gn/T" and see when you refresh the directory if and when files have been changed.

To determine this for the conda environment for anansnake directly, you can move to the folder specified after "Activating conda environment", which in your case is: ".snakemake/conda/3f88efe941f72bcdb4d5867b0d6db92f_". Similarly, you can refresh and see if and when files have been changed.

Realtime

I suppose the two options above could be insufficient to entirely answer your question since you might want to determine this realtime. To determine if anansnake is actually running, you can run "htop" from your command line (Linux). After running the htop command, you can find the anansnake command under the "Command" column and you can check if the process is running under the "S" column. Within the "S" column, the letter "D" implies that the process is dead (and you are better off with rerunning). The letter "S" stands for sleep that could indicate that either other processes have a higher priority and the program could resume soon. The letter "R" means ready and this is the one you actually want since it indicates that your process is actually running.

The package Glances could be an attractive alternative if you are running our software on Windows. Another option here is to go to Task Manager and try to find the process.

Other note

Some of the steps from anansnake (and the activation of some of its environments) can run up to several hours until they are finished, so it might be best to leave anansnake running in the background.

I hope this answers your question. If you have any other anansnake related questions, feel free to ask them here or directly on anansnake

Hi @Arts-of-coding,

Thanks for your wonderful suggestions.
Unfortunately, htop shows a 'question mark' under the 'S' Column. I tried to disable the 'System Integrity Protection' with a view to determining if the ? represents sleep or running or dead. However, that did not address the issue.
The log file seems to indicate that it is a hang.
Is there a way to actually get the scANANSE working? I even reduced the number of clusters under cluster_id to 2 with 2 contrasts.
The machine has 64GB of memory and 16 cores despite which there is trouble making it run.

Is there a way to circumvent this issue?

Thank you for your time in advance!

Hi @sylestiel,

I would suggest to try out your version of our installed package with the sample data that is supplied to the manuscript (see our Zenodo). If that works, then it is probably an issue with the configuration for anansnake with your input data.

Alternatively, you can run the example from the anansnake package to test this.

If this is the case you need to check if the config.yaml, rna_samples.tsv and atac_samples.tsv make sense (e.g. check if the correct genome is specified). Additionally, you can check if all your input files contain the correct gene names (uppercase/lowercase or if they are ensembl gene IDs). For anansnake to run, this needs to be identical to each other.

Otherwise, if your currently installed package is not able to run our sample data, I would recommend to reinstall anansnake again into a separate conda environment and try it out again.

Of note, the recommended way to install genomes is through genomepy and make sure that that is the one you install your genome with.

I hope one of these solutions can resolve your issue.

Thanks!