On Using FMIX my dataloader is unable to mix the data in training loop
IamSparky opened this issue · comments
I have used the following training loop for my plant image dataset
def train_loop_fn(data_loader, model, optimizer, device, scheduler=None):
running_loss = 0.0
running_corrects = 0
model.train()
alpha, decay_power = 1.0, 3.0
for batch_index,dataset in enumerate(data_loader):
image = dataset["image"]
label = dataset["label"]
image = image.to(device, dtype=torch.float)
label = label.to(device, dtype=torch.float)
image, perm, lambda_value = sample_and_apply(image, alpha, decay_power, (224, 224))
optimizer.zero_grad()
outputs = model(image)
loss = loss_fn(outputs, label) * lambda_value + loss_fn(outputs, label[perm]) * (1 - lambda_value)
loss.backward()
xm.optimizer_step(optimizer)
running_loss += loss.item()
scheduler.step()
train_loss = running_loss / float(len(train_dataset))
xm.master_print('training Loss: {:.4f} '.format(train_loss))
and my dataset class look like this
import cv2
import torch
from torchvision import transforms
import albumentations
from PIL import Image
class leaf_classification(Dataset):
def __init__(self, ids, image_id, label, mean , std , is_valid):
self.ids = ids
self.image_id = image_id
self.label = label
self.is_valid = is_valid
if self.is_valid == 1: # transforms for validation images
self.aug = albumentations.Compose([
albumentations.Normalize(mean , std , always_apply = True)
])
else: # transfoms for training images
self.aug = albumentations.Compose([
albumentations.Normalize(mean , std , always_apply = True),
albumentations.ShiftScaleRotate(shift_limit = 0.0625,
scale_limit = 0.1 ,
rotate_limit = 5,
p = 0.9)
])
def __len__(self):
return len(self.ids)
def __getitem__(self, index):
# converting jpg format of images to numpy array
img = np.array(Image.open('../input/cassava-leaf-disease-classification/train_images/' + self.image_id[index]))
img = cv2.resize(img, dsize=(224, 224), interpolation=cv2.INTER_CUBIC)
img = self.aug(image = img)['image']
img = np.transpose(img , (2,0,1)).astype(np.float32) # 2,0,1 because pytorch excepts image channel first then dimension of image
return {
'image' : torch.tensor(img, dtype = torch.float) ,
'label' : torch.tensor(self.label[index], dtype = torch.float)
}
And its generating the error
while training it generating this error
please help me in resolving this error . Here's the link to my notebook
Hi, looks like you're on the right lines. I've created a copy of your notebook and made a few changes here: https://www.kaggle.com/ethanwharris/fmix-cassava-leaf-disease-classification
Changes made
- used
from FMix.fmix import ...
instead ofcd FMix
(which cause the file error as after the cd everything was one level lower) - used
sample_mask
instead ofsample_and_apply
which used a mask from numpy and didn't seem to work with the XLA device - moved
model.to(device)
as thetorchsummary
package was moving the model to CPU - added
train_loss
to thescheduler.step
call (although this should probably be a val loss)
There are still some errors in eval_loop_fn
but these aren't related to FMix
Hope that helps!
Thanks brother , really appreciate your work as well as your help as it started working after I made the necessary changes . But I just wanna know am I going wrong with regards to your point number 4 ?
Hello Ethan ,
I am again started facing error for this line x1, x2 = image * mask, image[perm] * (1 - mask)
getting this error
in the function
defining the training loop
def train_loop_fn(data_loader, model, optimizer, device, scheduler=None):
running_loss = 0.0
running_corrects = 0
model.train()
alpha, decay_power = 1.0, 3.0
for batch_index,dataset in enumerate(data_loader):
image = dataset["image"]
label = dataset["label"]
lambda_value, mask = sample_mask(alpha, decay_power, (224, 224), 0.0, False)
mask = torch.from_numpy(mask).to(device)
perm = torch.randperm(image.size(0))
x1, x2 = image * mask, image[perm] * (1 - mask)
image = x1 + x2
image = image.to(device, dtype=torch.float)
label = label.to(device, dtype=torch.float)
optimizer.zero_grad()
outputs = model(image)
loss = loss_fn(outputs, label) * lambda_value + loss_fn(outputs, label[perm]) * (1 - lambda_value)
# loss = loss_fn(outputs, label)
loss.backward()
xm.optimizer_step(optimizer)
running_loss += loss.item()
train_loss = running_loss / float(len(train_data))
scheduler.step(train_loss)
return train_loss
don't konw why it was working fine earlier.
Hi, sorry I missed this.
Not sure what the error was here, it looks like the tensors are the wrong sizes. So might need to squeeze / unsqueeze in places to get it to work. Closing this issue as it looks like it's not a bug in our code.