tensorflow / lattice

Lattice methods in TensorFlow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error in running the example of lattice models

arrowx123 opened this issue · comments

I was running the uci_census.py file, with the create_calibrated_lattice function.
When parameter lattice_size is set to 2, the program can run successfully.
However, when the parameter is set to 3 (also 4 or other values, which I have not tested yet), the program will crash with the following error:

2018-06-17 19:54:25.814852: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.2 AVX AVX2 FMA
Traceback (most recent call last):
  File "uci_census.py", line 616, in <module>
    run()
  File "uci_census.py", line 609, in run
    main(argv)
  File "uci_census.py", line 586, in main
    train(estimator)
  File "uci_census.py", line 550, in train
    batch_size=FLAGS.batch_size, num_epochs=epochs, shuffle=True))
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 314, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 812, in _train_model
    log_step_count_steps=self._config.log_step_count_steps) as mon_sess:
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 380, in MonitoredTrainingSession
    stop_grace_period_secs=stop_grace_period_secs)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 787, in __init__
    stop_grace_period_secs=stop_grace_period_secs)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 511, in __init__
    self._sess = _RecoverableSession(self._coordinated_creator)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 972, in __init__
    _WrappedSession.__init__(self, self._create_session())
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 977, in _create_session
    return self._sess_creator.create_session()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 668, in create_session
    self.tf_sess = self._session_creator.create_session()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 440, in create_session
    init_fn=self._scaffold.init_fn)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 273, in prepare_session
    config=config)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 205, in _restore_checkpoint
    saver.restore(sess, ckpt.model_checkpoint_path)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1686, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 895, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1128, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1344, in _do_run
    options, run_metadata)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1363, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [1,1594323] rhs shape= [1,8192]
	 [[Node: save/Assign_3 = Assign[T=DT_FLOAT, _class=["loc:@calibrated_tf_lattice_model/lattice/hypercube_lattice_parameters"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](calibrated_tf_lattice_model/lattice/calibrated_tf_lattice_model/lattice/hypercube_lattice_parameters/Adam_1, save/RestoreV2_3)]]

Caused by op u'save/Assign_3', defined at:
  File "uci_census.py", line 616, in <module>
    run()
  File "uci_census.py", line 609, in run
    main(argv)
  File "uci_census.py", line 586, in main
    train(estimator)
  File "uci_census.py", line 550, in train
    batch_size=FLAGS.batch_size, num_epochs=epochs, shuffle=True))
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 314, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 812, in _train_model
    log_step_count_steps=self._config.log_step_count_steps) as mon_sess:
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 380, in MonitoredTrainingSession
    stop_grace_period_secs=stop_grace_period_secs)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 787, in __init__
    stop_grace_period_secs=stop_grace_period_secs)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 511, in __init__
    self._sess = _RecoverableSession(self._coordinated_creator)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 972, in __init__
    _WrappedSession.__init__(self, self._create_session())
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 977, in _create_session
    return self._sess_creator.create_session()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 668, in create_session
    self.tf_sess = self._session_creator.create_session()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 431, in create_session
    self._scaffold.finalize()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 212, in finalize
    self._saver.build()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1248, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1284, in _build
    build_save=build_save, build_restore=build_restore)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 759, in _build_internal
    restore_sequentially, reshape)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 471, in _AddShardedRestoreOps
    name="restore_shard"))
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 440, in _AddRestoreOps
    assign_ops.append(saveable.restore(tensors, shapes))
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 160, in restore
    self.op.get_shape().is_fully_defined())
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/state_ops.py", line 276, in assign
    validate_shape=validate_shape)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 59, in assign
    use_locking=use_locking, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1625, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [1,1594323] rhs shape= [1,8192]
	 [[Node: save/Assign_3 = Assign[T=DT_FLOAT, _class=["loc:@calibrated_tf_lattice_model/lattice/hypercube_lattice_parameters"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](calibrated_tf_lattice_model/lattice/calibrated_tf_lattice_model/lattice/hypercube_lattice_parameters/Adam_1, save/RestoreV2_3)]]

IMO, the point should be this line: Assign requires shapes of both tensors to match. lhs shape= [1,1594323] rhs shape= [1,8192], in which 1594323 = 3^13 and 8192 = 2^13.
Here 13 is the number of features used in this example, and 3 is the lattice_size we defined.
Could anyone help me with this?

Thanks for reporting this. Seems like an issue with default initilizers using 2 for lattice_size. Will investigate and get back to you.

Just to clarify, did you change the lattice_size when creating hparams in:
hparams = tfl.CalibratedLatticeHParams(...)
or did you change it via the flags? Note that hparams is being overriden by the flags:
hparams.parse(FLAGS.hparams)

Thanks for your timely reply!
Just double checked the code, I changed the parameters passed into the tfl.CalibratedLatticeHParams function directly, and FLAGS.hparams is None.
Also, changing the num_keypoints from 200 to 300 also gives rise to a similar error:

Caused by op u'save/Assign_24', defined at:
  File "/usr/local/Cellar/python@2/2.7.15/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/local/Cellar/python@2/2.7.15/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/local/lib/python2.7/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/usr/local/lib/python2.7/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/usr/local/lib/python2.7/site-packages/ipykernel/kernelapp.py", line 486, in start
    self.io_loop.start()
  File "/usr/local/lib/python2.7/site-packages/tornado/ioloop.py", line 1064, in start
    handler_func(fd_obj, events)
  File "/usr/local/lib/python2.7/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 450, in _handle_events
    self._handle_recv()
  File "/usr/local/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 480, in _handle_recv
    self._run_callback(callback, msg)
  File "/usr/local/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 432, in _run_callback
    callback(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/usr/local/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
    handler(stream, idents, msg)
  File "/usr/local/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/usr/local/lib/python2.7/site-packages/ipykernel/ipkernel.py", line 208, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/usr/local/lib/python2.7/site-packages/ipykernel/zmqshell.py", line 537, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2714, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/usr/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2818, in run_ast_nodes
    if self.run_code(code, result):
  File "/usr/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2878, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-33-1fb2aac1f78f>", line 51, in <module>
    train_evaluation, test_evaluation = main(estimator)
  File "<ipython-input-29-390452f3449e>", line 28, in main
    train_evaluation, test_evaluation = train(estimator)
  File "<ipython-input-28-550555971d85>", line 42, in train
    batch_size=FLAGS.batch_size, num_epochs=epochs, shuffle=True
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 314, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 812, in _train_model
    log_step_count_steps=self._config.log_step_count_steps) as mon_sess:
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 380, in MonitoredTrainingSession
    stop_grace_period_secs=stop_grace_period_secs)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 787, in __init__
    stop_grace_period_secs=stop_grace_period_secs)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 511, in __init__
    self._sess = _RecoverableSession(self._coordinated_creator)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 972, in __init__
    _WrappedSession.__init__(self, self._create_session())
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 977, in _create_session
    return self._sess_creator.create_session()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 668, in create_session
    self.tf_sess = self._session_creator.create_session()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 431, in create_session
    self._scaffold.finalize()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 212, in finalize
    self._saver.build()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1248, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1284, in _build
    build_save=build_save, build_restore=build_restore)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 759, in _build_internal
    restore_sequentially, reshape)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 471, in _AddShardedRestoreOps
    name="restore_shard"))
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 440, in _AddRestoreOps
    assign_ops.append(saveable.restore(tensors, shapes))
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 160, in restore
    self.op.get_shape().is_fully_defined())
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/state_ops.py", line 276, in assign
    validate_shape=validate_shape)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 59, in assign
    use_locking=use_locking, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1625, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [300] rhs shape= [200]
	 [[Node: save/Assign_24 = Assign[T=DT_FLOAT, _class=["loc:@calibrated_tf_lattice_model/lattice/pwl_calibration/fnlwgt_keypoints_inputs"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](calibrated_tf_lattice_model/lattice/calibrated_tf_lattice_model/lattice/pwl_calibration/fnlwgt_keypoints_inputs/Adam, save/RestoreV2_24)]]

Are you reusing the same model_dir/output_dir? If so it might be reading from an older snapshot with previous config and crashing since the model structure has changed. Can you please change output_dir or remove old snapshots and try again?

I do reuse the model_dir/output_dir parameter. After I change it to a different, the program runs successfully.
Thank you very much for the timely reply and patient explanation! ❤️