MemoryError exception
jstutters opened this issue · comments
Hi,
We're attempting to use nicMSlesions on data comprising of T1, FLAIR and T2 all 1x1x1mm isotropic. I'm not sure if the image size is a contributing factor but we're getting a MemoryError in base.py:load_test_patches (log below). There is some commented code that suggests that load_test_patches could yield smaller data structures instead of one large one - could that approach help?
Hi @jstutters,
Thank you for the feedback.
In the example, it looks like there are 4 modalities instead of three. Is it correct? Also, can you confirm me if it's a GPU or RAM memory problem?
Hi @sergivalverde thanks for the quick response.
The problem is occurs using both tensorflow and tensorflow-gpu and the traceback indicates that the MemoryError is triggered by a call to the numpy stack function so I'd surmise that it's a RAM memory problem. The system used to run nicMS has 32GB of RAM fitted (+ additional swap space).
Unfortunately an error was made during training that has meant MOD3 and MOD4 contain identical data. We're currently retraining with 3 channels and this will presumably help with the memory usage. Nevertheless, with I wouldn't expect this to exceed 32GB of RAM usage given the input .nii.gz files are under 30MB total.
Hi again,
Ok, definitely this can be a problem of memory limitations. The model takes a set of hyper-intense voxels from the FLAIR image and builds 11^3 patches around their center. If the number of hyper-intense voxels is large enough, maybe we are limiting the RAM size of the cluster.
Can you try to pre-load the baseline model and perform the inference just with FLAIR + T1w, checking the RAM load?
Can you try to pre-load the baseline model and perform the inference just with FLAIR + T1w, checking the RAM load?
Using the baseline model I get memory usage up to 40GB - inference does complete in that case so my problem is partly related to using more modalities but that memory usage was still resulting in a lot of swapping to disk.
I've sent a pull request that may help: #5
I will look at it as soon as possible.