tensorflow / cloud

The TensorFlow Cloud repository provides APIs that will allow to easily go from debugging and training your Keras and TensorFlow code in a local environment to distributed training in the cloud.

Home Page:https://github.com/tensorflow/cloud

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Enable running tfc.run() on notebook running from within a AI Platform hosted notebook.

SinaChavoshi opened this issue · comments

Using AI Platform hosted notebooks, we created an Jupyter notebook with the model that we are were planning to train and saved it. We created a separate notebook in which we had our runner wrapping script similar to

import tensorflow_cloud as tfc

tfc.run(
    docker_config=tfc.DockerConfig(
        image_build_bucket="somebucket",
        parent_image="gcr.io/xyz"), 
    entry_point="model.ipynb",
    distribution_strategy="auto",
    worker_count=5,
    requirements_txt='requirements.txt',
    chief_config=tfc.COMMON_MACHINE_CONFIGS["CPU"],
    worker_config=tfc.COMMON_MACHINE_CONFIGS["CPU"],
    job_labels={
        "job": "kaggle_competition",
        "team": "base_line",
    },
    stream_logs=False
)

The run fails with error

/opt/conda/lib/python3.7/site-packages/tensorflow_cloud/core/preprocess.py in _get_colab_notebook_content()
    207 def _get_colab_notebook_content():
    208     """Returns the colab notebook python code contents."""
--> 209     response = _message.blocking_request("get_ipynb",
    210                                          request="",
    211                                          timeout_sec=200)

AttributeError: 'NoneType' object has no attribute 'blocking_request'

Would be nice to add support for this case were all requirements and a proper base image are directly provided for the remote run.

Any news on this?

Any updates?

Try the following - I did not run from an AI Platform notebook, but from a private GitLab instance, but the error seams to be related with the same bug (or inprecise code):

Within the tensorflow cloud code, there are detection to see if the code is running from a google-colab notebook or from a kaggle notebook. As you see in the error - it ran into the 'colab' branch which failed as it was not running from colab.
Within 'preprocess.py', the proper branch will be reached if the attribute 'called_from_notebook' got the value 'False'. The detection for this is in the 'run.py' module and checks if your 'IPython.get_ipython().class.name' contains the word "Shell".
For me (GitLab) it contains the word shell and the branching goes into the wrong direction.

Long story short. Quick and dirty fix:
`
def _called_from_notebook_FIX():
return False

from unittest.mock import patch

with patch('tensorflow_cloud.core.run._called_from_notebook', new=_called_from_notebook_FIX):
#tfc.run code here...`

Monkey-patching a 'False' into it and it runs for me. Someone (maybe me) should write a pull request on a better environment detection for tensorflow-cloud.