Update Variable unmatched with Tensorflow implementation
zrbzrb1106 opened this issue · comments
I have run the test cases of Tensorflow-addons for RAdam both for the official implementation and the implementation here. There exists mismatching for the sparse case during my testing.
The use cases provided by TF:
var_0 = tf.Variable([1.0, 2.0])
var_1 = tf.Variable([3.0, 4.0])
grad_0 = tf.IndexedSlices(tf.constant([0.1]), tf.constant([0]), tf.constant([2]))
grad_1 = tf.IndexedSlices(tf.constant([0.04]), tf.constant([1]), tf.constant([2]))
E.g. The first ten rounds of TF code:
tf.Tensor([0.99989 2. ], shape=(2,), dtype=float32) tf.Tensor([3. 3.99992], shape=(2,), dtype=float32)
tf.Tensor([0.99978006 2. ], shape=(2,), dtype=float32) tf.Tensor([3. 3.9998398], shape=(2,), dtype=float32)
tf.Tensor([0.9996701 2. ], shape=(2,), dtype=float32) tf.Tensor([3. 3.9997597], shape=(2,), dtype=float32)
tf.Tensor([0.9995601 2. ], shape=(2,), dtype=float32) tf.Tensor([3. 3.9996796], shape=(2,), dtype=float32)
tf.Tensor([0.99945015 2. ], shape=(2,), dtype=float32) tf.Tensor([3. 3.9995995], shape=(2,), dtype=float32)
tf.Tensor([0.99941427 2. ], shape=(2,), dtype=float32) tf.Tensor([3. 3.9995337], shape=(2,), dtype=float32)
tf.Tensor([0.9993716 2. ], shape=(2,), dtype=float32) tf.Tensor([3. 3.999461], shape=(2,), dtype=float32)
tf.Tensor([0.9993229 2. ], shape=(2,), dtype=float32) tf.Tensor([3. 3.9993823], shape=(2,), dtype=float32)
tf.Tensor([0.99926883 2. ], shape=(2,), dtype=float32) tf.Tensor([3. 3.999298], shape=(2,), dtype=float32)
tf.Tensor([0.9992098 2. ], shape=(2,), dtype=float32) tf.Tensor([3. 3.9992092], shape=(2,), dtype=float32)
The first ten rounds of this code:
tf.Tensor([0.99989 1.99998], shape=(2,), dtype=float32) tf.Tensor([2.99997 3.99992], shape=(2,), dtype=float32)
tf.Tensor([0.99978006 1.99996 ], shape=(2,), dtype=float32) tf.Tensor([2.99994 3.9998398], shape=(2,), dtype=float32)
tf.Tensor([0.9996701 1.9999399], shape=(2,), dtype=float32) tf.Tensor([2.9999099 3.9997597], shape=(2,), dtype=float32)
tf.Tensor([0.9995601 1.9999199], shape=(2,), dtype=float32) tf.Tensor([2.9998798 3.9996796], shape=(2,), dtype=float32)
tf.Tensor([0.99945015 1.9998999 ], shape=(2,), dtype=float32) tf.Tensor([2.9998498 3.9995995], shape=(2,), dtype=float32)
tf.Tensor([0.99941444 1.9998798 ], shape=(2,), dtype=float32) tf.Tensor([2.9998198 3.9995337], shape=(2,), dtype=float32)
tf.Tensor([0.99937177 1.9998598 ], shape=(2,), dtype=float32) tf.Tensor([2.9997897 3.999461 ], shape=(2,), dtype=float32)
tf.Tensor([0.99932307 1.9998398 ], shape=(2,), dtype=float32) tf.Tensor([2.9997597 3.9993823], shape=(2,), dtype=float32)
tf.Tensor([0.999269 1.9998198], shape=(2,), dtype=float32) tf.Tensor([2.9997296 3.999298 ], shape=(2,), dtype=float32)
tf.Tensor([0.99921 1.9997997], shape=(2,), dtype=float32) tf.Tensor([2.9996996 3.9992092], shape=(2,), dtype=float32)
seems that for this case
var_0 = tf.Variable([1.0, 2.0])
grad_0 = tf.IndexedSlices(tf.constant([0.1]), tf.constant([0]), tf.constant([2]))
The value in the variable tensor at the index 1 (2.0) is also updated which is not expected. After
seeing the code, the main difference is shown below:
TF code:
with tf.control_dependencies([var_t]):
var_update = self._resource_scatter_add(
var, indices, tf.gather(-lr_t * var_t, indices))
This code:
var_update = state_ops.assign_sub(
var, lr_t * var_t, use_locking=self._use_locking)
so it maybe caused by the assign_sub
operation which update var
on both indexes. I think this could be a question. The results after 2000 iterations generated by this code is [-2, 1.96] for the first example while the result of official implementation is [-2, 2] and it could not pass the test case. Changing assign_sub
to _resource_scatter_add
solve this problem.
UPDATE:
This implementation is also right. After checking the paper, I think the TF implementation is actually a lazy version of r-adam. So I will close this issue.