agent_dqn.py torch.load can't find zip files under checkpoints/DDQAgent
zhang112596 opened this issue · comments
My first step of the program you wrote successfully ran
However I got error when execute following command in the second step.
python2 run.py --agt 9 --usr 1 --max_turn 40 \
--movie_kb_path ./deep_dialog/data/movie_kb.1k.p \
--dqn_hidden_size 80 \
--experience_replay_pool_size 1000 \
--episodes 300 \
--simulation_epoch_size 100 \
--write_model_dir ./deep_dialog/checkpoints/DDQAgent/ \
--slot_err_prob 0.00 \
--intent_err_prob 0.00 \
--batch_size 16 \
--goal_file_path ./deep_dialog/data/user_goals_first_turn_template.part.movie.v1.p \
--trained_model_path ./deep_dialog/checkpoints/DDQAgent/TRAINED_MODEL \
--run_mode 3
Here is the error output:
[zyh@zyh-virtualbox src]$ python2 run.py --agt 9 --usr 1 --max_turn 40 \
> --movie_kb_path ./deep_dialog/data/movie_kb.1k.p \
> --dqn_hidden_size 80 \
> --experience_replay_pool_size 1000 \
> --episodes 300 \
> --simulation_epoch_size 100 \
> --write_model_dir ./deep_dialog/checkpoints/DDQAgent/ \
> --slot_err_prob 0.00 \
> --intent_err_prob 0.00 \
> --batch_size 16 \
> --goal_file_path ./deep_dialog/data/user_goals_first_turn_template.part.movie.v1.p \
> --trained_model_path ./deep_dialog/checkpoints/DDQAgent/TRAINED_MODEL \
> --run_mode 3
Dialog Parameters:
{
"simulation_epoch_size": 100,
"slot_err_mode": 0,
"diaact_nl_pairs": "./deep_dialog/data/dia_act_nl_pairs.v6.json",
"save_check_point": 10,
"episodes": 300,
"predict_mode": false,
"planning_steps": 4,
"cmd_input_mode": 0,
"goal_file_path": "./deep_dialog/data/user_goals_first_turn_template.part.movie.v1.p",
"max_turn": 40,
"grounded": 0,
"experience_replay_pool_size": 1000,
"write_model_dir": "./deep_dialog/checkpoints/DDQAgent/",
"usr": 1,
"boosted": 1,
"auto_suggest": 0,
"torch_seed": 100,
"run_mode": 3,
"trained_model_path": "./deep_dialog/checkpoints/DDQAgent/TRAINED_MODEL",
"success_rate_threshold": 0.6,
"nlu_model_path": "./deep_dialog/models/nlu/lstm_[1468447442.91]_39_80_0.921.p",
"train_world_model": 1,
"epsilon": 0,
"batch_size": 16,
"learning_phase": "all",
"nlg_model_path": "./deep_dialog/models/nlg/lstm_tanh_relu_[1468202263.38]_2_0.610.p",
"act_set": "./deep_dialog/data/dia_acts.txt",
"movie_kb_path": "./deep_dialog/data/movie_kb.1k.p",
"slot_err_prob": 0.0,
"warm_start": 1,
"warm_start_epochs": 100,
"dict_path": "./deep_dialog/data/dicts.v3.p",
"intent_err_prob": 0.0,
"slot_set": "./deep_dialog/data/slot_set.txt",
"act_level": 0,
"dqn_hidden_size": 80,
"agt": 9,
"split_fold": 5,
"gamma": 0.9
}
Traceback (most recent call last):
File "run.py", line 209, in <module>
agent = AgentDQN(movie_kb, act_set, slot_set, agent_params)
File "/home/zyh/Desktop/DDQ/src/deep_dialog/agents/agent_dqn.py", line 71, in __init__
self.load(params['trained_model_path'])
File "/home/zyh/Desktop/DDQ/src/deep_dialog/agents/agent_dqn.py", line 384, in load
self.dqn.load_state_dict(torch.load(filename))
File "/home/zyh/.local/lib/python2.7/site-packages/torch/serialization.py", line 527, in load
with _open_zipfile_reader(f) as opened_zipfile:
File "/home/zyh/.local/lib/python2.7/site-packages/torch/serialization.py", line 224, in __init__
super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: [enforce fail at inline_container.cc:143] . PytorchStreamReader failed reading zip archive: not a ZIP archive
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x47 (0x7feb2b02cd37 in /home/zyh/.local/lib/python2.7/site-packages/torch/lib/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::valid(char const*, char const*) + 0x72 (0x7feb2e1afb62 in /home/zyh/.local/lib/python2.7/site-packages/torch/lib/libtorch.so)
frame #2: caffe2::serialize::PyTorchStreamReader::init() + 0xa1 (0x7feb2e1b2b31 in /home/zyh/.local/lib/python2.7/site-packages/torch/lib/libtorch.so)
frame #3: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::string const&) + 0x64 (0x7feb2e1b5c04 in /home/zyh/.local/lib/python2.7/site-packages/torch/lib/libtorch.so)
frame #4: <unknown function> + 0x6c8b56 (0x7feb762dfb56 in /home/zyh/.local/lib/python2.7/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x29660b (0x7feb75ead60b in /home/zyh/.local/lib/python2.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #40: __libc_start_main + 0xf2 (0x7feb7df24152 in /usr/lib/libc.so.6)
frame #41: _start + 0x2e (0x56240dbfa04e in python2)
Problem solved, forget to specify the trained module file in command-line option --trained_model_path
. Thanks!