因磁盘满而中断后,无法自动恢复
amd5 opened this issue · comments
amd5 commented
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
2023-07-26 09:59:13.215 | INFO | __main__:__init__:12 -
Hello baby~
2023-07-26 09:59:13.216 | INFO | __main__:train:26 -
Start Train ----> images98
2023-07-26 09:59:13.221 | INFO | utils.train:__init__:41 -
Taget:
min_Accuracy: 0.97
min_Epoch: 20
max_Loss: 0.05
2023-07-26 09:59:13.221 | INFO | utils.train:__init__:45 -
USE GPU ----> 0
2023-07-26 09:59:13.221 | INFO | utils.train:__init__:52 -
Search for history checkpoints...
Traceback (most recent call last):
File "/www/wwwroot/dddd_trainer/app.py", line 33, in <module>
fire.Fire(App)
File "/www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/lib/python3.6/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/lib/python3.6/site-packages/fire/core.py", line 480, in _Fire
target=component.__name__)
File "/www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/lib/python3.6/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/www/wwwroot/dddd_trainer/app.py", line 27, in train
trainer = train.Train(project_name)
File "/www/wwwroot/dddd_trainer/utils/train.py", line 63, in __init__
os.path.join(self.checkpoints_path, newer_checkpoint), self.device)
File "/www/wwwroot/dddd_trainer/nets/__init__.py", line 223, in load_checkpoint
param = torch.load(path, map_location=device)
File "/www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/lib/python3.6/site-packages/torch/serialization.py", line 600, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "/www/wwwroot/dddd_trainer/11ddbaf3386aea1f2974eee984542152_venv/lib/python3.6/site-packages/torch/serialization.py", line 242, in __init__
super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
amd5 commented
删除掉/www/wwwroot/dddd_trainer/projects/images98/checkpoints内,压缩包容量异常的压缩文件即可续传