agent_dqn.py torch.load can't find zip files under checkpoints/DDQAgent

Question

agent_dqn.py torch.load can't find zip files under checkpoints/DDQAgent

zhang112596 opened this issue 4 years ago · comments

My first step of the program you wrote successfully ran

However I got error when execute following command in the second step.

python2 run.py --agt 9 --usr 1 --max_turn 40 \
	      --movie_kb_path ./deep_dialog/data/movie_kb.1k.p \
	      --dqn_hidden_size 80 \
	      --experience_replay_pool_size 1000 \
	      --episodes 300 \
	      --simulation_epoch_size 100 \
	      --write_model_dir ./deep_dialog/checkpoints/DDQAgent/ \
	      --slot_err_prob 0.00 \
	      --intent_err_prob 0.00 \
	      --batch_size 16 \
	      --goal_file_path ./deep_dialog/data/user_goals_first_turn_template.part.movie.v1.p \
	      --trained_model_path ./deep_dialog/checkpoints/DDQAgent/TRAINED_MODEL \
	      --run_mode 3

Here is the error output:

[zyh@zyh-virtualbox src]$ python2 run.py --agt 9 --usr 1 --max_turn 40 \
>       --movie_kb_path ./deep_dialog/data/movie_kb.1k.p \
>       --dqn_hidden_size 80 \
>       --experience_replay_pool_size 1000 \
>       --episodes 300 \
>       --simulation_epoch_size 100 \
>       --write_model_dir ./deep_dialog/checkpoints/DDQAgent/ \
>       --slot_err_prob 0.00 \
>       --intent_err_prob 0.00 \
>       --batch_size 16 \
>       --goal_file_path ./deep_dialog/data/user_goals_first_turn_template.part.movie.v1.p \
>       --trained_model_path ./deep_dialog/checkpoints/DDQAgent/TRAINED_MODEL \
>       --run_mode 3
Dialog Parameters: 
{
  "simulation_epoch_size": 100, 
  "slot_err_mode": 0, 
  "diaact_nl_pairs": "./deep_dialog/data/dia_act_nl_pairs.v6.json", 
  "save_check_point": 10, 
  "episodes": 300, 
  "predict_mode": false, 
  "planning_steps": 4, 
  "cmd_input_mode": 0, 
  "goal_file_path": "./deep_dialog/data/user_goals_first_turn_template.part.movie.v1.p", 
  "max_turn": 40, 
  "grounded": 0, 
  "experience_replay_pool_size": 1000, 
  "write_model_dir": "./deep_dialog/checkpoints/DDQAgent/", 
  "usr": 1, 
  "boosted": 1, 
  "auto_suggest": 0, 
  "torch_seed": 100, 
  "run_mode": 3, 
  "trained_model_path": "./deep_dialog/checkpoints/DDQAgent/TRAINED_MODEL", 
  "success_rate_threshold": 0.6, 
  "nlu_model_path": "./deep_dialog/models/nlu/lstm_[1468447442.91]_39_80_0.921.p", 
  "train_world_model": 1, 
  "epsilon": 0, 
  "batch_size": 16, 
  "learning_phase": "all", 
  "nlg_model_path": "./deep_dialog/models/nlg/lstm_tanh_relu_[1468202263.38]_2_0.610.p", 
  "act_set": "./deep_dialog/data/dia_acts.txt", 
  "movie_kb_path": "./deep_dialog/data/movie_kb.1k.p", 
  "slot_err_prob": 0.0, 
  "warm_start": 1, 
  "warm_start_epochs": 100, 
  "dict_path": "./deep_dialog/data/dicts.v3.p", 
  "intent_err_prob": 0.0, 
  "slot_set": "./deep_dialog/data/slot_set.txt", 
  "act_level": 0, 
  "dqn_hidden_size": 80, 
  "agt": 9, 
  "split_fold": 5, 
  "gamma": 0.9
}
Traceback (most recent call last):
  File "run.py", line 209, in <module>
    agent = AgentDQN(movie_kb, act_set, slot_set, agent_params)
  File "/home/zyh/Desktop/DDQ/src/deep_dialog/agents/agent_dqn.py", line 71, in __init__
    self.load(params['trained_model_path'])
  File "/home/zyh/Desktop/DDQ/src/deep_dialog/agents/agent_dqn.py", line 384, in load
    self.dqn.load_state_dict(torch.load(filename))
  File "/home/zyh/.local/lib/python2.7/site-packages/torch/serialization.py", line 527, in load
    with _open_zipfile_reader(f) as opened_zipfile:
  File "/home/zyh/.local/lib/python2.7/site-packages/torch/serialization.py", line 224, in __init__
    super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: [enforce fail at inline_container.cc:143] . PytorchStreamReader failed reading zip archive: not a ZIP archive
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x47 (0x7feb2b02cd37 in /home/zyh/.local/lib/python2.7/site-packages/torch/lib/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::valid(char const*, char const*) + 0x72 (0x7feb2e1afb62 in /home/zyh/.local/lib/python2.7/site-packages/torch/lib/libtorch.so)
frame #2: caffe2::serialize::PyTorchStreamReader::init() + 0xa1 (0x7feb2e1b2b31 in /home/zyh/.local/lib/python2.7/site-packages/torch/lib/libtorch.so)
frame #3: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::string const&) + 0x64 (0x7feb2e1b5c04 in /home/zyh/.local/lib/python2.7/site-packages/torch/lib/libtorch.so)
frame #4: <unknown function> + 0x6c8b56 (0x7feb762dfb56 in /home/zyh/.local/lib/python2.7/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x29660b (0x7feb75ead60b in /home/zyh/.local/lib/python2.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #40: __libc_start_main + 0xf2 (0x7feb7df24152 in /usr/lib/libc.so.6)
frame #41: _start + 0x2e (0x56240dbfa04e in python2)

zhang112596 · Answer 1 · Wed Jan 13 2021 16:34:37 GMT+0800 (China Standard Time)

Problem solved, forget to specify the trained module file in command-line option --trained_model_path. Thanks!