测试训练时显示input shape不对
BandageWorm opened this issue · comments
现在tensorflow1.0已经发布了,API有变动,我把代码里API变动的地方都改了,但是测试训练时仍显示softmax loss function的matmul矩阵shape不对,不知道是哪里出了问题,先问问,回头有时间我自己再读读源码找一下~
log:
dim: 6865
准备数据
bucket 0 中有数据 164276 条
bucket 1 中有数据 127570 条
bucket 2 中有数据 32081 条
bucket 3 中有数据 10660 条
共有数据 334587 条
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
开启投影:512
Traceback (most recent call last):
File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py", line 670, in _call_cpp_shape_fn_impl
status)
File "/usr/lib64/python3.5/contextlib.py", line 66, in __exit__
next(self.gen)
File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape must be rank 2 but is rank 1 for 'model_with_buckets/sequence_loss/sequence_loss_by_example/sampled_softmax_loss/MatMul_1' (op: 'MatMul') with input shapes: [?], [?,1024].
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "s2s.py", line 324, in <module>
tf.app.run()
File "/usr/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "s2s.py", line 319, in main
train()
File "s2s.py", line 129, in train
model = create_model(sess, False)
File "s2s.py", line 110, in create_model
dtype
File "/home/kurt/Seq2Seq_Chatbot_QA/s2s_model.py", line 143, in __init__
softmax_loss_function=softmax_loss_function
File "/usr/lib/python3.5/site-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 1195, in model_with_buckets
softmax_loss_function=softmax_loss_function))
File "/usr/lib/python3.5/site-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 1110, in sequence_loss
softmax_loss_function=softmax_loss_function))
File "/usr/lib/python3.5/site-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 1067, in sequence_loss_by_example
crossent = softmax_loss_function(target, logit)
File "/home/kurt/Seq2Seq_Chatbot_QA/s2s_model.py", line 67, in sampled_loss
num_classes=self.target_vocab_size
File "/usr/lib/python3.5/site-packages/tensorflow/python/ops/nn_impl.py", line 1191, in sampled_softmax_loss
name=name)
File "/usr/lib/python3.5/site-packages/tensorflow/python/ops/nn_impl.py", line 995, in _compute_sampled_logits
inputs, sampled_w, transpose_b=True) + sampled_b
File "/usr/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 1855, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "/usr/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1454, in _mat_mul
transpose_b=transpose_b, name=name)
File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2397, in create_op
set_shapes_for_outputs(ret)
File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1757, in set_shapes_for_outputs
shapes = shape_func(op)
File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1707, in call_with_requiring
return call_cpp_shape_fn(op, require_shape_fn=True)
File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py", line 610, in call_cpp_shape_fn
debug_python_shape_fn, require_shape_fn)
File "/usr/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py", line 675, in _call_cpp_shape_fn_impl
raise ValueError(err.message)
ValueError: Shape must be rank 2 but is rank 1 for 'model_with_buckets/sequence_loss/sequence_loss_by_example/sampled_softmax_loss/MatMul_1' (op: 'MatMul') with input shapes: [?], [?,1024].
另外有人知道如何能让tensorflow CPU版能支持这些SIMD指令集吗?每次都弹警告好烦阿...
BTW, readme里面介绍的数据集dgk_shooter_min.conv
已经没有了,我改用的那个小黄鸡分词版xiaohuangji50w_fenciA.conv.zip
,不知道是不是这个原因(估计不是
tensorflow的改动特别过分,代码需要很多修改才行
原repo的字幕没有没有可以看这个fork https://github.com/qhduan/dgk_lost_conv
我刚刚解决了这个bug,还是api不匹配的问题,我都列举出来吧:
s2s_model.py
:
54行:def sampled_loss(inputs, labels):
里面input
和labels
对调(就是这个导致了本issue的错误)
62-63行:参数改为:
weights=local_w_t,
biases=local_b,
labels=labels,
inputs=local_inputs,
num_sampled=num_samples,
num_classes=self.target_vocab_size
71行:tf.nn.seq2seq.embedding_attention_seq2seq(
改为tf.contrib.legacy_seq2seq.embedding_attention_seq2seq(
31-33行: tf.nn.rnn_cell.*
改为tf.contrib.rnn.*
113、132行:tf.nn.seq2seq.model_with_buckets(
改为tf.contrib.legacy_seq2seq.model_with_buckets(
现在正在训练,如果还弹错我再来更新~
好的,如果可以用可以考虑写一个merge request,我开一个tensorflow 1.x分支之类的
补充一下:
所有的tf.all_variables()改为tf.global_variables()
所有的tf.initialize_all_variables()改为tf.global_variables_initializer()