d2l-ai / d2l-en

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

Home Page:https://D2L.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

nn.dataparallel has issue for mac (mps device)

rnb007 opened this issue · comments

This is the error i get when I use the below function

def try_all_gpus(): #@save
"""Return all available GPUs, or [cpu(),] if no GPU exists."""
devices = [torch.device(f'cuda:{i}')
for i in range(torch.cuda.device_count())]
return devices if devices else [torch.device('cpu')]

trainer = torch.optim.Adam(net.parameters(), lr=lr)
3 loss = nn.CrossEntropyLoss(reduction="none")
----> 4 d2l.train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs)

File ~/anaconda3/envs/dl_env/lib/python3.10/site-packages/d2l/torch.py:1507, in train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs, devices)
1504 timer, num_batches = d2l.Timer(), len(train_iter)
1505 animator = d2l.Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0, 1],
1506 legend=['train loss', 'train acc', 'test acc'])
-> 1507 net = nn.DataParallel(net, device_ids=devices).to(devices[0])
1508 for epoch in range(num_epochs):
1509 # Sum of training loss, sum of training accuracy, no. of examples,
1510 # no. of predictions
1511 metric = d2l.Accumulator(4)

IndexError: list index out of range

This happens as mac does not have cuda support.

by tweaking the above function by just changing cpu to mps, kernel always dies
def try_all_gpus(): #@save
"""Return all available GPUs, or [cpu(),] if no GPU exists."""
devices = [torch.device(f'cuda:{i}')
for i in range(torch.cuda.device_count())]
return devices if devices else [torch.device('mps')]

How can i run the nn.dataparallel or for that matter chapter 16 d2l.train_ch13 function from 16.2 section ?