is it possible to resume training a .pkl file on the same kimg with a new datasetof pictures?
nicolai256 opened this issue · comments
Describe the bug
I tried doing this but it gives me an error (see below)
when resume kimg with the normal dataset of images it doesn't give me this error.
I have checked if all the images are 1024px and they are.
it seems to start training but fails after the first tick.
input code
python train.py --cfg=stylegan3-t --data=C:\deepdream-test\stylegan3-fun\dataset22\images\1024.zip --aug=ada --augpipe=bg --target=0.7 --gpus=1 --batch=8 --batch-gpu=8 --mbstd-group=8 --gamma=6.6 --mirror=1 --kimg=25000 --snap=1 --metrics=none --resume=C:\deepdream-test\stylegan3-fun\training-runs\network-snapshot-005832.pkl --resume-kimg=5832
error code
Setting up augmentation...
Distributing across 1 GPUs...
Setting up training phases...
Exporting sample images...
Initializing logs...
Training for 25000 kimg...
tick 0 kimg 5832.0 time 1m 34s sec/tick 20.5 sec/kimg 2557.87 maintenance 73.5 cpumem 4.52 gpumem 16.10 reserved 19.92 augment 0.000
Traceback (most recent call last):
File "c:\deepdream-test\stylegan3-fun\train.py", line 324, in <module>
main() # pylint: disable=no-value-for-parameter
File "C:\Users\Gebruiker\anaconda3\lib\site-packages\click\core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "C:\Users\Gebruiker\anaconda3\lib\site-packages\click\core.py", line 1055, in main
rv = self.invoke(ctx)
File "C:\Users\Gebruiker\anaconda3\lib\site-packages\click\core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Users\Gebruiker\anaconda3\lib\site-packages\click\core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "c:\deepdream-test\stylegan3-fun\train.py", line 317, in main
launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
File "c:\deepdream-test\stylegan3-fun\train.py", line 95, in launch_training
subprocess_fn(rank=0, c=c, temp_dir=temp_dir)
File "c:\deepdream-test\stylegan3-fun\train.py", line 50, in subprocess_fn
training_loop.training_loop(rank=rank, **c)
File "c:\deepdream-test\stylegan3-fun\training\training_loop.py", line 260, in training_loop
phase_real_img, phase_real_c = next(training_set_iterator)
File "C:\Users\Gebruiker\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 521, in __next__
data = self._next_data()
File "C:\Users\Gebruiker\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "C:\Users\Gebruiker\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 1229, in _process_data
data.reraise()
File "C:\Users\Gebruiker\anaconda3\lib\site-packages\torch\_utils.py", line 425, in reraise
raise self.exc_type(msg)
AssertionError: Caught AssertionError in DataLoader worker process 1.
Original Traceback (most recent call last):
File "C:\Users\Gebruiker\anaconda3\lib\site-packages\torch\utils\data\_utils\worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "C:\Users\Gebruiker\anaconda3\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\Gebruiker\anaconda3\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "c:\deepdream-test\stylegan3-fun\training\dataset.py", line 99, in __getitem__
assert list(image.shape) == self.image_shape
AssertionError
Are you trying to start from your previous model? If I recall, that one was of 512x512 resolution, so you won't be able to do that (yet, it can be done, but requires a bit of time to fix). Basically, you'll need to start from a 1024 model if your dataset is 1024x1024.
I upscaled all the images, i thought all of them were 512px and upscaled them to 1024px for resuming training on my 1024 model but I just checked all of them and some were 1024px and upscaled to 2048px, that was the cause of the error.
seems to be running fine now, sorry for so much bothering