wrong calculations with TPU distribution strategy

Question

wrong calculations with TPU distribution strategy

mohammad0081 opened this issue 13 days ago · comments

Mohammad Hassan Heydari commented 13 days ago

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

2.16.1

Custom code

Yes

OS platform and distribution

colab os platform

Mobile device

No response

Python version

No response

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

TPU

Current behavior?

we expected that the training process , would be faster on multiple TPUs rather than single T4 GPU , but the numbers returned from fit() method would be the same . instead, we got faster calculations but whole different numbers returned , such as accuracy and loss significantly decreased after 5 epochs . in GPU , after 5 epochs we get 90+ percent of accuracy on test set , but on TPU mirrored strategy we get 24 % of accuracy on test set and it converges on this range

Standalone code to reproduce the issue

# the whole project is private, we have a dataset on medical images and we split the dataset into into two # directories , then we create a keras datagenerator and create train_datagen and test_datagen . 
# then we load a model from keras.applications and add three Dense layers to it for classification task . # then we fine tune the model with adam optimizer with lr = 0.0001 . 
# the creation and compilation of the model are on the strategy.scope() , which strategy is created
# exactly from Tensorflow.org  "use TPU's" and mirrored strategy docs

Relevant log output

No response