tensorflow / tpu

Reference models and tools for Cloud TPUs.

Home Page:https://cloud.google.com/tpu/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TPU returns error when running tf.keras.layers.experimental.preprocessing.RandomRotation

silvaurus opened this issue · comments

commented

Hello!

Could you help me take a look at the error I encountered when trying to include tf.keras.layers.experimental.preprocessing.RandomRotation in my model running on cloud TPU.

The code has been working well on GPU. I have also tried other methods such as tf.keras.layers.experimental.preprocessing.RandomFlip and tf.keras.layers.experimental.preprocessing.RandomContrast(0.2). They work correctly on TPU.

Here is the error code:
2021-07-31 23:14:08.520206: I tensorflow/core/tpu/kernels/tpu_compile_op_common.cc:600] Compilation of 5532396960057933239 with session name took 238.274266ms and failed
2021-07-31 23:14:08.520294: F tensorflow/core/tpu/kernels/tpu_program_group.cc:86] Check failed: xla_tpu_programs.size() > 0 (0 vs. 0)
https://symbolize.stripped_domain/r/?trace=7f05960a118b,7f05960a120f,7f0366ffd569,7f036c302535,7f036c351459,7f036c351f49,7f036c349ca3,7f036c34b08c,7f037319032f,7f037319145b,7f03734a7e64,7f03734a5b36,7f03734891ee,7f0596041608&map=b7c22d7954df6b6961e4435041132cf899ee4a5e:7f0363fba000-7f0377cb9270
*** SIGABRT received by PID 942533 (TID 943317) on cpu 94 from PID 942533; stack trace: ***
PC: @ 0x7f05960a118b (unknown) raise
@ 0x7f03634b01e0 976 (unknown)
@ 0x7f05960a1210 3920 (unknown)
@ 0x7f0366ffd56a 896 tensorflow::tpu::TpuProgramGroup::Initialize()
@ 0x7f036c302536 1696 tensorflow::tpu::TpuCompilationCacheExternal::InitializeEntry()
@ 0x7f036c35145a 1072 tensorflow::tpu::TpuCompilationCacheInterface::CompileIfKeyAbsentHelper()
@ 0x7f036c351f4a 128 tensorflow::tpu::TpuCompilationCacheInterface::CompileIfKeyAbsent()
@ 0x7f036c349ca4 1248 tensorflow::tpu::TpuCompileOpKernelCommon::ComputeInternal()
@ 0x7f036c34b08d 608 tensorflow::tpu::TpuCompileOpKernelCommon::Compute()
@ 0x7f0373190330 2544 tensorflow::(anonymous namespace)::ExecutorState<>::Process()
@ 0x7f037319145c 48 std::_Function_handler<>::_M_invoke()
@ 0x7f03734a7e65 160 Eigen::ThreadPoolTempl<>::WorkerLoop()
@ 0x7f03734a5b37 64 std::_Function_handler<>::_M_invoke()
@ 0x7f03734891ef 96 tensorflow::(anonymous namespace)::PThread::ThreadFn()
@ 0x7f0596041609 (unknown) start_thread
https://symbolize.stripped_domain/r/?trace=7f05960a118b,7f03634b01df,7f05960a120f,7f0366ffd569,7f036c302535,7f036c351459,7f036c351f49,7f036c349ca3,7f036c34b08c,7f037319032f,7f037319145b,7f03734a7e64,7f03734a5b36,7f03734891ee,7f0596041608&map=b7c22d7954df6b6961e4435041132cf899ee4a5e:7f0363fba000-7f0377cb9270,ca1b7ab241ee28147b3d590cadb5dc1b:7f03567b1000-7f03637e3b20
E0731 23:14:08.725014 943317 coredump_hook.cc:292] RAW: Remote crash data gathering hook invoked.
E0731 23:14:08.725041 943317 coredump_hook.cc:384] RAW: Skipping coredump since rlimit was 0 at process start.
E0731 23:14:08.725066 943317 client.cc:222] RAW: Coroner client retries enabled (b/136286901), will retry for up to 30 sec.
E0731 23:14:08.725073 943317 coredump_hook.cc:447] RAW: Sending fingerprint to remote end.
E0731 23:14:08.725080 943317 coredump_socket.cc:124] RAW: Stat failed errno=2 on socket /var/google/services/logmanagerd/remote_coredump.socket
E0731 23:14:08.725089 943317 coredump_hook.cc:451] RAW: Cannot send fingerprint to Coroner: [NOT_FOUND] Missing crash reporting socket. Is the listener running?
E0731 23:14:08.725095 943317 coredump_hook.cc:525] RAW: Discarding core.
E0731 23:14:09.251309 943317 process_state.cc:771] RAW: Raising signal 6 with default behavior

Best Regards,

Did you use the tpu strategy when compiling the Augmentation layer?