[Bug]CUDA error: out of memory

Question

[Bug]CUDA error: out of memory

zihaozhang9 opened this issue 5 years ago · comments

python 3.6.5
torch 0.4.1

I use:
python -u train_search.py \ --tmp_data_dir /path/to/your/data \ --save log_path \ --add_layers 6 \ --add_layers 12 \ --dropout_rate 0.1 \ --dropout_rate 0.4 \ --dropout_rate 0.7 \ --note note_of_this_run |tee trainlog.log
log:
Experiment dir : log_pathsearch-note_of_this_run-20190615-024812
06/15 02:48:13 AM GPU device = 0
06/15 02:48:13 AM args = Namespace(add_layers=['0', '6', '12'], add_width=['0'], arch_learning_rate=0.0006, arch_weight_decay=0.001, batch_size=96, cifar100=False, cuto
ut=False, cutout_length=16, drop_path_prob=0.3, dropout_rate=['0.1', '0.4', '0.7'], epochs=25, gpu=0, grad_clip=5, init_channels=16, layers=5, learning_rate=0.025, lear
ning_rate_min=0.0, momentum=0.9, note='note_of_this_run', report_freq=50, save='log_pathsearch-note_of_this_run-20190615-024812', seed=2, tmp_data_dir='/path/to/your/da
ta', train_portion=0.5, weight_decay=0.0003, workers=2)
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to /path/to/your/data/cifar-10-python.tar.gz
06/15 02:49:29 AM param size = 1.275834MB
06/15 02:49:29 AM Epoch: 0 lr: 2.500000e-02
Traceback (most recent call last):
File "train_search.py", line 468, in
main()
File "train_search.py", line 158, in main
train_acc, train_obj = train(train_queue, valid_queue, model, network_params, criterion, optimizer, optimizer_a, lr, train_arch=False)
File "train_search.py", line 295, in train
logits = model(input)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/pdarts/model_search.py", line 139, in forward
s0, s1 = s1, cell(s0, s1, weights)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/pdarts/model_search.py", line 70, in forward
s = sum(self.cell_ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states))
File "/pdarts/model_search.py", line 70, in
s = sum(self.cell_ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states))
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/pdarts/model_search.py", line 33, in forward
return sum(w * op(x) for w, op in zip(weights, self.m_ops))
File "/pdarts/model_search.py", line 33, in
return sum(w * op(x) for w, op in zip(weights, self.m_ops))
RuntimeError: CUDA error: out of memory

zihaozhang9 · Answer 1 · Sat Jun 15 2019 11:16:30 GMT+0800 (China Standard Time)

Looks like the GPU problem. Training has started now