diux-dev / cluster

train on AWS

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

'BatchTransformDataLoader' object has no attribute 'batch_sampler'

yaroslavvb opened this issue · comments

One of my runs crashed with this error, have you seen something like this? @bearpelican

Batch size changed: 256
  File "train_imagenet_nv.py", line 460, in <module>
    main()
  File "train_imagenet_nv.py", line 254, in main
    dm.set_epoch(epoch)
  File "train_imagenet_nv.py", line 86, in set_epoch
    if cur_phase: self.set_data(cur_phase)
  File "train_imagenet_nv.py", line 98, in set_data
    self.trn_dl.batch_sampler.batch_size = phase['bs']
'BatchTransformDataLoader' object has no attribute 'batch_sampler'

@yaroslavvb
Did an old file somehow get run? I had refactored the dataloader so that that line is no longer called
afb8bfb#diff-75be79d6640e3bb96c7683235830b933L307

aha
training is baked into pytorch.v7 AMI, and my task.upload(training) fails silently when directory is present so we end up with old version

gonna fix the upload

I'll try to make sure to delete all the training stuff on the next AMI version

false alarm, upload works as expected, the issue was that my refactored version used script version from ~ instead of ~/training