tensorflow / cloud

The TensorFlow Cloud repository provides APIs that will allow to easily go from debugging and training your Keras and TensorFlow code in a local environment to distributed training in the cloud.

Home Page:https://github.com/tensorflow/cloud

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

tfc.remote() == False if python file entrypoint and distribution_strategy is None

tc-wolf opened this issue · comments

The 'get_preprocessed_entry_point" isn't run if the distribution strategy is None and the entry point is a python file (ends in ".py"):

https://github.com/tensorflow/cloud/blob/master/src/python/tensorflow_cloud/core/run.py#L266-L282

This is a problem because this is the place where the TF_KERAS_RUNNING_REMOTELY is injected to the entrypoint. Because this isn't set, calls to tfc.remote() in the user's script won't work in the way expected (i.e., will always return False).

Proposed Solution:
Inject this into the Dockerfile directly instead as an ENV var and pass in when building the image (in ContainerBuilder._create_docker_file).