splicebox / MntJULiP

Comprehensive and scalable differential splicing analyses

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Run time issue

Maveriyke opened this issue · comments

The run.py execution is taking way too long to execute. I have 200 samples, is this expected?
Also, how much RAM should be allocated to each thread, the program seems to be exceeding memory allocated when I allocate less than 2gigs for each thread.

As a point of reference, I'm trying to run the latest version of MntJULiP (run.py --version reports MntJULiP v2.0) on 64 samples (21 conditions) using 20 threads (all pegged at 100% CPU usage), and it's been cooking for almost 3 days now.

It has finished successfully on smaller subsets of the same data in shorter time, but I wanted to run the samples all at once to make downstream analysis of splice events (group_id's) easier ... let's see if/when this finishes ... 🤞

Thank you for using Mntjulip! To optimize speed, please consider using the "--raw-counts-only" flag if estimated counts and psis values are not required. Additionally, the "--group-filter" flag can filter groups where all samples have counts lower than, like, 15.
Regarding memory usage, which depends on the data input size, generally I suggest using a lower batch size and number of threads to optimize memory.
Let me know if you have any further questions or need assistance!

Thanks for these detailed suggestions, @edwwlui !

I ended up killing the large run -- it consisted of dose response data from several compounds.

I broke up the datasets into smaller ones, batched by compound ... maybe ~4 doses, with 25 samples in each batch.

These runs finished within 45 minutes or so. I may go back to debug the larger run at some point, but this is good enough for me for now.