ain-soph / trojanzoo

When I run "python train.py --color --verbose 1 --dataset gtsrb --lr_scheduler --cutout --grad_clip 5.0 --save --download --epoch 50", I meet the follow problem:

Traceback (most recent call last):
File "/home/itl/Documents/xrh/backdoor/trojanzoo/examples/train.py", line 30, in
dataset = trojanvision.datasets.create(**kwargs)
File "/home/itl/Documents/xrh/backdoor/trojanzoo/examples/../trojanvision/datasets/init.py", line 78, in create
return trojanzoo.datasets.create(dataset_name=dataset_name, dataset=dataset,
File "/home/itl/Documents/xrh/backdoor/trojanzoo/examples/../trojanzoo/datasets.py", line 534, in create
return DatasetType(**result)
File "/home/itl/Documents/xrh/backdoor/trojanzoo/examples/../trojanvision/datasets/folder/gtsrb.py", line 40, in init
return super().init(norm_par=norm_par, loss_weights=loss_weights, **kwargs)
File "/home/itl/Documents/xrh/backdoor/trojanzoo/examples/../trojanvision/datasets/imagefolder.py", line 79, in init
super().init(**kwargs)
File "/home/itl/Documents/xrh/backdoor/trojanzoo/examples/../trojanvision/datasets/imageset.py", line 150, in init
super().init(default_model=default_model, **kwargs)
File "/home/itl/Documents/xrh/backdoor/trojanzoo/examples/../trojanzoo/datasets.py", line 163, in init
loss_weights = self.get_loss_weights() if loss_weights else None
File "/home/itl/Documents/xrh/backdoor/trojanzoo/examples/../trojanzoo/datasets.py", line 450, in get_loss_weights
_, targets = dataset_to_tensor(dataset)
File "/home/itl/Documents/xrh/backdoor/trojanzoo/examples/../trojanzoo/utils/data.py", line 87, in dataset_to_tensor
return torch.stack(data), torch.as_tensor(targets, dtype=torch.long)
TypeError: expected Tensor as element 0 in argument 0, but got Image

That means your dataset element is PIL.Image rather than torch.Tensor, (99% because dataset transform is None rather than PILToTensor), which is unexpected.

Update: This seems to because of transform=None

trojanzoo/trojanzoo/datasets.py

Lines 449 to 450 in 51d728b

    
           dataset = self.get_dataset('train', transform=None) 
        
           _, targets = dataset_to_tensor(dataset)

Would you mind reviewing the PR #150 ?
I haven't checked if it works yet, but I believe that shall solve the issue.

It works, but a new problem occur:
Traceback (most recent call last):
File "/home/itl/Documents/xrh/backdoor/trojanzoo/examples/train.py", line 36, in
model._train(**trainer)
File "/home/itl/Documents/xrh/backdoor/trojanzoo/examples/../trojanvision/models/imagemodel.py", line 560, in _train
return super()._train(epochs=epochs, optimizer=optimizer, lr_scheduler=lr_scheduler,
File "/home/itl/Documents/xrh/backdoor/trojanzoo/examples/../trojanzoo/models.py", line 1015, in _train
return train(module=self._model, num_classes=self.num_classes,
File "/home/itl/Documents/xrh/backdoor/trojanzoo/examples/../trojanzoo/utils/train.py", line 63, in train
best_validate_result = validate_fn(loader=loader_valid, get_data_fn=get_data_fn,
File "/home/itl/Documents/xrh/backdoor/trojanzoo/examples/../trojanvision/models/imagemodel.py", line 481, in _validate
return super()._validate(**kwargs)
File "/home/itl/Documents/xrh/backdoor/trojanzoo/examples/../trojanzoo/models.py", line 1056, in _validate
return validate(module=module, num_classes=num_classes, loader=loader,
File "/home/itl/Documents/xrh/backdoor/trojanzoo/examples/../trojanzoo/utils/train.py", line 246, in validate
loss = float(loss_fn(_input, _label, _output=_output, **kwargs))
File "/home/itl/Documents/xrh/backdoor/trojanzoo/examples/../trojanzoo/models.py", line 660, in loss
return criterion(_output, _label)
File "/home/itl/anaconda3/envs/trojanzoo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/itl/anaconda3/envs/trojanzoo/lib/python3.10/site-packages/torch/nn/modules/loss.py", line 1163, in forward
return F.cross_entropy(input, target, weight=self.weight,
File "/home/itl/anaconda3/envs/trojanzoo/lib/python3.10/site-packages/torch/nn/functional.py", line 2996, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: expected scalar type Float but found Long

That doesn't seem to be an obvious issue. Let me debug actually.

Thanks for finding the bug! Does the previous PR solve your issue?

yes, it solved. Thank you!

But the loss is nan.

remove your previously saved loss_weights.npy and regenerate one. It's saved at your dataset folder

Does it work? @rahulxie

Does it work? @rahulxie

yes, it works! Thanks!

I want to konw what kinds of models used to train gtsrb dataset. Is resnet18_comp ok with the same config of cifar10 (i.e., the "Quick Start" of "Train a model" in README.md) ?

Yes, I think I use resnet18_comp in IMC paper.

You may add my WeChat?

ain-soph-aur

ain-soph-aur

okok, I have send friend request.

	dataset = self.get_dataset('train', transform=None)
	_, targets = dataset_to_tensor(dataset)

The problem of gtsrb dataset