wrong calculations with TPU distribution strategy
mohammad0081 opened this issue · comments
Mohammad Hassan Heydari commented
Issue type
Bug
Have you reproduced the bug with TensorFlow Nightly?
No
Source
source
TensorFlow version
2.16.1
Custom code
Yes
OS platform and distribution
colab os platform
Mobile device
No response
Python version
No response
Bazel version
No response
GCC/compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
TPU
Current behavior?
we expected that the training process , would be faster on multiple TPUs rather than single T4 GPU , but the numbers returned from fit() method would be the same . instead, we got faster calculations but whole different numbers returned , such as accuracy and loss significantly decreased after 5 epochs . in GPU , after 5 epochs we get 90+ percent of accuracy on test set , but on TPU mirrored strategy we get 24 % of accuracy on test set and it converges on this range
Standalone code to reproduce the issue
# the whole project is private, we have a dataset on medical images and we split the dataset into into two # directories , then we create a keras datagenerator and create train_datagen and test_datagen .
# then we load a model from keras.applications and add three Dense layers to it for classification task . # then we fine tune the model with adam optimizer with lr = 0.0001 .
# the creation and compilation of the model are on the strategy.scope() , which strategy is created
# exactly from Tensorflow.org "use TPU's" and mirrored strategy docs
Relevant log output
No response