ucbrise / actnn

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

how to avoid memory fragmentation in ActNN?

Jack47 opened this issue · comments

May I known how you guys implemented this defragmentation in ActNN?
wecom-temp-63c4a7a412756448a2179e2b801edcd5

In my model training experience: smaller MAX_SPLIT_SIZE, worse performance. bigger MAX_SPLIT_SIZE, will finally result OOM

The corresponding code is here

elif level == 'L5': # 2-bit + swap + defragmentation
config.swap = True
os.environ['PYTORCH_CACHE_THRESHOLD'] = '256000000'
warnings.warn("The defragmentation at L5 requires modification of the c++ "
"code of PyTorch. You need to compile this special fork of "
"PyTorch: https://github.com/merrymercy/pytorch/tree/actnn_exp")

Great thanks, got that. use malloc instead of caching allocator for large size