测试训练时显示input shape不对

Question

测试训练时显示input shape不对

BandageWorm opened this issue 7 years ago · comments

现在tensorflow1.0已经发布了，API有变动，我把代码里API变动的地方都改了，但是测试训练时仍显示softmax loss function的matmul矩阵shape不对，不知道是哪里出了问题，先问问，回头有时间我自己再读读源码找一下~

log：

dim:  6865
准备数据
bucket 0 中有数据 164276 条
bucket 1 中有数据 127570 条
bucket 2 中有数据 32081 条
bucket 3 中有数据 10660 条
共有数据 334587 条
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
开启投影：512
Traceback (most recent call last):
  File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py", line 670, in _call_cpp_shape_fn_impl
    status)
  File "/usr/lib64/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape must be rank 2 but is rank 1 for 'model_with_buckets/sequence_loss/sequence_loss_by_example/sampled_softmax_loss/MatMul_1' (op: 'MatMul') with input shapes: [?], [?,1024].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "s2s.py", line 324, in <module>
    tf.app.run()
  File "/usr/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 44, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "s2s.py", line 319, in main
    train()
  File "s2s.py", line 129, in train
    model = create_model(sess, False)
  File "s2s.py", line 110, in create_model
    dtype
  File "/home/kurt/Seq2Seq_Chatbot_QA/s2s_model.py", line 143, in __init__
    softmax_loss_function=softmax_loss_function
  File "/usr/lib/python3.5/site-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 1195, in model_with_buckets
    softmax_loss_function=softmax_loss_function))
  File "/usr/lib/python3.5/site-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 1110, in sequence_loss
    softmax_loss_function=softmax_loss_function))
  File "/usr/lib/python3.5/site-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 1067, in sequence_loss_by_example
    crossent = softmax_loss_function(target, logit)
  File "/home/kurt/Seq2Seq_Chatbot_QA/s2s_model.py", line 67, in sampled_loss
    num_classes=self.target_vocab_size
  File "/usr/lib/python3.5/site-packages/tensorflow/python/ops/nn_impl.py", line 1191, in sampled_softmax_loss
    name=name)
  File "/usr/lib/python3.5/site-packages/tensorflow/python/ops/nn_impl.py", line 995, in _compute_sampled_logits
    inputs, sampled_w, transpose_b=True) + sampled_b
  File "/usr/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 1855, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "/usr/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1454, in _mat_mul
    transpose_b=transpose_b, name=name)
  File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2397, in create_op
    set_shapes_for_outputs(ret)
  File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1757, in set_shapes_for_outputs
    shapes = shape_func(op)
  File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1707, in call_with_requiring
    return call_cpp_shape_fn(op, require_shape_fn=True)
  File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py", line 610, in call_cpp_shape_fn
    debug_python_shape_fn, require_shape_fn)
  File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py", line 675, in _call_cpp_shape_fn_impl
    raise ValueError(err.message)
ValueError: Shape must be rank 2 but is rank 1 for 'model_with_buckets/sequence_loss/sequence_loss_by_example/sampled_softmax_loss/MatMul_1' (op: 'MatMul') with input shapes: [?], [?,1024].

另外有人知道如何能让tensorflow CPU版能支持这些SIMD指令集吗？每次都弹警告好烦阿...

KurtGao · Answer 1 · Mon Feb 27 2017 04:29:42 GMT+0800 (China Standard Time)

BTW, readme里面介绍的数据集dgk_shooter_min.conv已经没有了，我改用的那个小黄鸡分词版xiaohuangji50w_fenciA.conv.zip，不知道是不是这个原因（估计不是

段清华DEAN · Answer 2 · Mon Feb 27 2017 17:57:29 GMT+0800 (China Standard Time)

tensorflow的改动特别过分，代码需要很多修改才行

原repo的字幕没有没有可以看这个fork https://github.com/qhduan/dgk_lost_conv

KurtGao · Answer 3 · Tue Feb 28 2017 01:10:10 GMT+0800 (China Standard Time)

我刚刚解决了这个bug，还是api不匹配的问题，我都列举出来吧：

s2s_model.py:
54行：def sampled_loss(inputs, labels):里面input和labels对调（就是这个导致了本issue的错误）
62-63行：参数改为：

                weights=local_w_t,
                biases=local_b,
                labels=labels,
                inputs=local_inputs,
                num_sampled=num_samples,
                num_classes=self.target_vocab_size

71行：tf.nn.seq2seq.embedding_attention_seq2seq(改为tf.contrib.legacy_seq2seq.embedding_attention_seq2seq(

31-33行： tf.nn.rnn_cell.*改为tf.contrib.rnn.*

113、132行：tf.nn.seq2seq.model_with_buckets(改为tf.contrib.legacy_seq2seq.model_with_buckets(

现在正在训练，如果还弹错我再来更新~

段清华DEAN · Answer 4 · Wed Mar 01 2017 11:56:16 GMT+0800 (China Standard Time)

好的，如果可以用可以考虑写一个merge request，我开一个tensorflow 1.x分支之类的

Lake Chan · Answer 5 · Thu Apr 13 2017 00:09:37 GMT+0800 (China Standard Time)

补充一下：
所有的tf.all_variables()改为tf.global_variables()
所有的tf.initialize_all_variables()改为tf.global_variables_initializer()