MemoryError in vae_train.py

Question

MemoryError in vae_train.py

Chazzz opened this issue 5 years ago · comments

Running python vae_train.py prompts a memory error on my system. I felt bad about this, but after running the numbers, vae_train.py needs to allocate ~125 GB of memory to this array!

>>> import numpy as np
>>> M = 1000
>>> N = 10000
>>> data = np.zeros((M*N, 64, 64, 3), dtype=np.uint8)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError

Zeng Xiao · Answer 1 · Tue Jan 22 2019 06:45:30 GMT+0800 (China Standard Time)

Hmm, this looks like #19. I am trying the solution suggested there. Thanks for crunching the numbers, I had a measly 16gigs when it happened to me.

Chazzz · Answer 2 · Tue Jan 22 2019 08:07:30 GMT+0800 (China Standard Time)

@zuoanqh Not tremendously surprising that memory limitations are present in both experiments. A more dynamic loading would probably fix both issues.

Chazzz · Answer 3 · Mon Jan 28 2019 16:23:14 GMT+0800 (China Standard Time)

@zuoanqh Not sure how far you got on this, but I have memory-free loading (not including training) at 1.25 hours (8 mins per epoch) in my fork's atari/vae_train.py. I convert the episodes into uncompressed (10x), individual images (100x), which are then loaded in parallel (10x) before being fed into tensorflow. Also being in black and white (atari only) is another 3x performance improvement which doesn't convert to doom and carracing. The only faster alternative I can think of is to convert to BMP and get tensorflow to manage the entire batching process using parallel prefetching.

Note that 10M uncompressed frames is about 80GB for single channel and 240GB for tri-channel images and takes several hours. VAE training (not including loading) takes about 5 hours on my system.

Zeng Xiao · Answer 4 · Mon Jan 28 2019 16:54:57 GMT+0800 (China Standard Time)

@Chazzz my experiment requires transitions rather than frames, so that's taking a bit more time to upgrade without doubling disk/memory usage -- i got it to work with about 1k episodes though...

Chazzz · Answer 5 · Tue Jan 29 2019 01:11:06 GMT+0800 (China Standard Time)

@zuoanqh Yikes that's a lot of channels, then again you don't really need 10k episodes unless you're creating a baseline.