google-deepmind / xmanager

A platform for managing machine learning experiments

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tensorboard instance is not found when running examples/cifar10_tensorflow

AbubakrHassan opened this issue · comments

When running examples/cifar10_tensorflow the job launches fine and trains to completion. however the tensorboard link created shows a page that says

Not found: TensorboardExperiment projects/****/locations/us-central1/tensorboards/2824407877244944384/experiments/7194241469736026112 is not found.

Logs from building the job

I0331 10:24:09.242159 139812284593984 docker_lib.py:67] Local docker: {'Platform': {'Name': 'Docker Engine - Community'}, 'Components': [{'Name': 'Engine', 'Version': '20.10.2', 'Details': {'ApiVersion': '1.41', 'Arch': 'amd64', 'BuildTime': '2020-12-28T16:15:28.000000000+00:00', 'Experimental': 'false', 'GitCommit': '8891c58', 'GoVersion': 'go1.13.15', 'KernelVersion': '5.15.15-1rodete2-amd64', 'MinAPIVersion': '1.12', 'Os': 'linux'}}, {'Name': 'containerd', 'Version': '1.4.3', 'Details': {'GitCommit': '269548fa27e0089a8b8278fc4fc781d7f65a939b'}}, {'Name': 'runc', 'Version': '1.0.0-rc92', 'Details': {'GitCommit': 'ff819c7e9184c13b7c2607fe6c30ae19403a7aff'}}, {'Name': 'docker-init', 'Version': '0.19.0', 'Details': {'GitCommit': 'de40ad0'}}], 'Version': '20.10.2', 'ApiVersion': '1.41', 'MinAPIVersion': '1.12', 'GitCommit': '8891c58', 'GoVersion': 'go1.13.15', 'Os': 'linux', 'Arch': 'amd64', 'KernelVersion': '5.15.15-1rodete2-amd64', 'BuildTime': '2020-12-28T16:15:28.000000000+00:00'}
Dockerfile:

FROM gcr.io/deeplearning-platform-release/tf2-gpu.2-6

RUN if ! id 1000; then useradd -m -u 1000 clouduser; fi

ENV LANG=C.UTF-8
RUN apt-get update && apt-get install -y git netcat
RUN python -m pip install --upgrade pip
COPY cifar10_tensorflow/requirements.txt /cifar10_tensorflow/requirements.txt
RUN python -m pip install -r cifar10_tensorflow/requirements.txt
COPY cifar10_tensorflow/ /cifar10_tensorflow
RUN chown -R 1000:root /cifar10_tensorflow && chmod -R 775 /cifar10_tensorflow
WORKDIR cifar10_tensorflow

COPY entrypoint.sh ./entrypoint.sh
RUN chown -R 1000:root ./entrypoint.sh && chmod -R 775 ./entrypoint.sh

ENTRYPOINT ["./entrypoint.sh"]

Size of Docker input: 7.1 kB
Building Docker image, please wait...
I0331 10:24:10.163763 139812284593984 docker_lib.py:67] Local docker: {'Platform': {'Name': 'Docker Engine - Community'}, 'Components': [{'Name': 'Engine', 'Version': '20.10.2', 'Details': {'ApiVersion': '1.41', 'Arch': 'amd64', 'BuildTime': '2020-12-28T16:15:28.000000000+00:00', 'Experimental': 'false', 'GitCommit': '8891c58', 'GoVersion': 'go1.13.15', 'KernelVersion': '5.15.15-1rodete2-amd64', 'MinAPIVersion': '1.12', 'Os': 'linux'}}, {'Name': 'containerd', 'Version': '1.4.3', 'Details': {'GitCommit': '269548fa27e0089a8b8278fc4fc781d7f65a939b'}}, {'Name': 'runc', 'Version': '1.0.0-rc92', 'Details': {'GitCommit': 'ff819c7e9184c13b7c2607fe6c30ae19403a7aff'}}, {'Name': 'docker-init', 'Version': '0.19.0', 'Details': {'GitCommit': 'de40ad0'}}], 'Version': '20.10.2', 'ApiVersion': '1.41', 'MinAPIVersion': '1.12', 'GitCommit': '8891c58', 'GoVersion': 'go1.13.15', 'Os': 'linux', 'Arch': 'amd64', 'KernelVersion': '5.15.15-1rodete2-amd64', 'BuildTime': '2020-12-28T16:15:28.000000000+00:00'}
I0331 10:24:10.164260 139812284593984 docker_lib.py:89] Building Docker image
[+] Building 55.8s (16/16) FINISHED                                                                                                                                          
 => [internal] load build definition from Dockerfile                                                                                                                    0.2s
 => => transferring dockerfile: 694B                                                                                                                                    0.0s
 => [internal] load .dockerignore                                                                                                                                       0.2s
 => => transferring context: 2B                                                                                                                                         0.0s
 => [internal] load metadata for gcr.io/deeplearning-platform-release/tf2-gpu.2-6:latest                                                                                0.7s
 => [ 1/11] FROM gcr.io/deeplearning-platform-release/tf2-gpu.2-6@sha256:d9bf7c2069ff4bec9d9fc6d30fb286f1646124d04012d9932ee59d58eaca9ac4                               0.0s
 => [internal] load build context                                                                                                                                       0.1s
 => => transferring context: 8.03kB                                                                                                                                     0.0s
 => CACHED [ 2/11] RUN if ! id 1000; then useradd -m -u 1000 clouduser; fi                                                                                              0.0s
 => [ 3/11] RUN apt-get update && apt-get install -y git netcat                                                                                                        15.9s
 => [ 4/11] RUN python -m pip install --upgrade pip                                                                                                                    16.5s
 => [ 5/11] COPY cifar10_tensorflow/requirements.txt /cifar10_tensorflow/requirements.txt                                                                               0.5s
 => [ 6/11] RUN python -m pip install -r cifar10_tensorflow/requirements.txt                                                                                           17.7s
 => [ 7/11] COPY cifar10_tensorflow/ /cifar10_tensorflow                                                                                                                0.5s
 => [ 8/11] RUN chown -R 1000:root /cifar10_tensorflow && chmod -R 775 /cifar10_tensorflow                                                                              1.0s
 => [ 9/11] WORKDIR cifar10_tensorflow                                                                                                                                  0.3s
 => [10/11] COPY entrypoint.sh ./entrypoint.sh                                                                                                                          0.2s
 => [11/11] RUN chown -R 1000:root ./entrypoint.sh && chmod -R 775 ./entrypoint.sh                                                                                      0.7s
 => exporting to image                                                                                                                                                  1.4s
 => => exporting layers                                                                                                                                                 1.0s
 => => writing image sha256:1fb33a18a65d7efd4fcec00ef688ec2ac5502851be5d36bcc9a7b5cf342da775                                                                            0.0s
 => => naming to gcr.io/***/cifar10_tensorflow:20220331-102410-116512                                                                                  0.0s
 => => naming to gcr.io/***/cifar10_tensorflow:latest                                                                                                  0.0s
I0331 10:25:06.734303 139812284593984 docker_lib.py:98] Building docker image: Done
I0331 10:25:06.775659 139812284593984 docker_lib.py:67] Local docker: {'Platform': {'Name': 'Docker Engine - Community'}, 'Components': [{'Name': 'Engine', 'Version': '20.10.2', 'Details': {'ApiVersion': '1.41', 'Arch': 'amd64', 'BuildTime': '2020-12-28T16:15:28.000000000+00:00', 'Experimental': 'false', 'GitCommit': '8891c58', 'GoVersion': 'go1.13.15', 'KernelVersion': '5.15.15-1rodete2-amd64', 'MinAPIVersion': '1.12', 'Os': 'linux'}}, {'Name': 'containerd', 'Version': '1.4.3', 'Details': {'GitCommit': '269548fa27e0089a8b8278fc4fc781d7f65a939b'}}, {'Name': 'runc', 'Version': '1.0.0-rc92', 'Details': {'GitCommit': 'ff819c7e9184c13b7c2607fe6c30ae19403a7aff'}}, {'Name': 'docker-init', 'Version': '0.19.0', 'Details': {'GitCommit': 'de40ad0'}}], 'Version': '20.10.2', 'ApiVersion': '1.41', 'MinAPIVersion': '1.12', 'GitCommit': '8891c58', 'GoVersion': 'go1.13.15', 'Os': 'linux', 'Arch': 'amd64', 'KernelVersion': '5.15.15-1rodete2-amd64', 'BuildTime': '2020-12-28T16:15:28.000000000+00:00'}
I0331 10:25:20.892401 139812284593984 docker_lib.py:107] {"status":"The push refers to repository [gcr.io/***/cifar10_tensorflow]"}
{"status":"Preparing","progressDetail":{},"id":"ecbb601dd983"}
{"status":"Preparing","progressDetail":{},"id":"0490c7aeabf0"}
{"status":"Preparing","progressDetail":{},"id":"5f70bf18a086"}
{"status":"Preparing","progressDetail":{},"id":"57c7da5da29e"}
{"status":"Preparing","progressDetail":{},"id":"ddfab15718d9"}
{"status":"Preparing","progressDetail":{},"id":"a43c37333595"}
{"status":"Preparing","progressDetail":{},"id":"479d29ce9800"}
{"status":"Preparing","progressDetail":{},"id":"f4bfb05d8c99"}
{"status":"Preparing","progressDetail":{},"id":"f634932f0fdf"}
{"status":"Preparing","progressDetail":{},"id":"5f70bf18a086"}
{"status":"Preparing","progressDetail":{},"id":"e5a69fe43a97"}
{"status":"Preparing","progressDetail":{},"id":"ed55b6190435"}
{"status":"Preparing","progressDetail":{},"id":"87ec19f85372"}
{"status":"Preparing","progressDetail":{},"id":"8c3b041fd87c"}
{"status":"Preparing","progressDetail":{},"id":"0ac428b7127a"}
{"status":"Preparing","progressDetail":{},"id":"370688903f01"}
{"status":"Waiting","progressDetail":{},"id":"a43c37333595"}
{"status":"Preparing","progressDetail":{},"id":"76d62c4c37cc"}
{"status":"Preparing","progressDetail":{},"id":"b3ab95a574c8"}
{"status":"Preparing","progressDetail":{},"id":"d1b010151b48"}
{"status":"Preparing","progressDetail":{},"id":"b80bc089358e"}
{"status":"Preparing","progressDetail":{},"id":"11bc9b36546a"}
{"status":"Preparing","progressDetail":{},"id":"fffe44800c74"}
{"status":"Preparing","progressDetail":{},"id":"1175e7a0a8e0"}
{"status":"Preparing","progressDetail":{},"id":"992f2c95dad2"}
{"status":"Waiting","progressDetail":{},"id":"f4bfb05d8c99"}
{"status":"Preparing","progressDetail":{},"id":"91b2ad1e9845"}
{"status":"Waiting","progressDetail":{},"id":"f634932f0fdf"}
{"status":"Waiting","progressDetail":{},"id":"ed55b6190435"}
{"status":"Waiting","progressDetail":{},"id":"e5a69fe43a97"}
{"status":"Preparing","progressDetail":{},"id":"178f9673d3c0"}
{"status":"Preparing","progressDetail":{},"id":"3298591378da"}
{"status":"Waiting","progressDetail":{},"id":"1175e7a0a8e0"}
{"status":"Preparing","progressDetail":{},"id":"b79b505a5328"}
{"status":"Preparing","progressDetail":{},"id":"963f45082214"}
{"status":"Waiting","progressDetail":{},"id":"8c3b041fd87c"}
{"status":"Preparing","progressDetail":{},"id":"59edb8a95299"}
{"status":"Preparing","progressDetail":{},"id":"6083edd74f0c"}
{"status":"Waiting","progressDetail":{},"id":"370688903f01"}
{"status":"Preparing","progressDetail":{},"id":"4236d5cafaa0"}
{"status":"Preparing","progressDetail":{},"id":"924dcf5e7282"}
{"status":"Waiting","progressDetail":{},"id":"3298591378da"}
{"status":"Waiting","progressDetail":{},"id":"963f45082214"}
{"status":"Preparing","progressDetail":{},"id":"da29c29e84ca"}
{"status":"Preparing","progressDetail":{},"id":"1526a09df7d6"}
{"status":"Preparing","progressDetail":{},"id":"f35a9ab279de"}
{"status":"Preparing","progressDetail":{},"id":"6cd83fbc36a4"}
{"status":"Preparing","progressDetail":{},"id":"a7a59823f7fd"}
{"status":"Preparing","progressDetail":{},"id":"a86b3e862105"}
{"status":"Waiting","progressDetail":{},"id":"b80bc089358e"}
{"status":"Waiting","progressDetail":{},"id":"d1b010151b48"}
{"status":"Preparing","progressDetail":{},"id":"9ad794ce6bea"}
{"status":"Preparing","progressDetail":{},"id":"d533033842c0"}
{"status":"Preparing","progressDetail":{},"id":"9f54eef41275"}
{"status":"Waiting","progressDetail":{},"id":"6083edd74f0c"}
{"status":"Waiting","progressDetail":{},"id":"4236d5cafaa0"}
{"status":"Waiting","progressDetail":{},"id":"d533033842c0"}
{"status":"Waiting","progressDetail":{},"id":"9f54eef41275"}
{"status":"Waiting","progressDetail":{},"id":"a86b3e862105"}
{"status":"Waiting","progressDetail":{},"id":"924dcf5e7282"}
{"status":"Waiting","progressDetail":{},"id":"da29c29e84ca"}
{"status":"Waiting","progressDetail":{},"id":"b79b505a5328"}
{"status":"Waiting","progressDetail":{},"id":"91b2ad1e9845"}
{"status":"Waiting","progressDetail":{},"id":"1526a09df7d6"}
{"status":"Waiting","progressDetail":{},"id":"a7a59823f7fd"}
{"status":"Waiting","progressDetail":{},"id":"178f9673d3c0"}
{"status":"Waiting","progressDetail":{},"id":"9ad794ce6bea"}
{"status":"Waiting","progressDetail":{},"id":"0ac428b7127a"}
{"status":"Waiting","progressDetail":{},"id":"11bc9b36546a"}
{"status":"Waiting","progressDetail":{},"id":"59edb8a95299"}
{"status":"Waiting","progressDetail":{},"id":"992f2c95dad2"}
{"status":"Waiting","progressDetail":{},"id":"f35a9ab279de"}
{"status":"Waiting","progressDetail":{},"id":"6cd83fbc36a4"}
{"status":"Waiting","progressDetail":{},"id":"b3ab95a574c8"}
{"status":"Pushing","progressDetail":{"current":512,"total":528},"progress":"[================================================\u003e  ]     512B/528B","id":"ecbb601dd983"}
{"status":"Pushing","progressDetail":{"current":512,"total":528},"progress":"[================================================\u003e  ]     512B/528B","id":"0490c7aeabf0"}
{"status":"Pushing","progressDetail":{"current":512,"total":7094},"progress":"[===\u003e                                               ]     512B/7.094kB","id":"ddfab15718d9"}
{"status":"Pushing","progressDetail":{"current":512,"total":7094},"progress":"[===\u003e                                               ]     512B/7.094kB","id":"57c7da5da29e"}
{"status":"Pushing","progressDetail":{"current":11776,"total":7094},"progress":"[==================================================\u003e]  11.78kB","id":"ddfab15718d9"}
{"status":"Pushing","progressDetail":{"current":3072,"total":528},"progress":"[==================================================\u003e]  3.072kB","id":"0490c7aeabf0"}
{"status":"Pushing","progressDetail":{"current":11776,"total":7094},"progress":"[==================================================\u003e]  11.78kB","id":"57c7da5da29e"}
{"status":"Layer already exists","progressDetail":{},"id":"5f70bf18a086"}
{"status":"Pushing","progressDetail":{"current":31984,"total":2986836},"progress":"[\u003e                                                  ]  31.98kB/2.987MB","id":"a43c37333595"}
{"status":"Pushing","progressDetail":{"current":3072,"total":528},"progress":"[==================================================\u003e]  3.072kB","id":"ecbb601dd983"}
{"status":"Pushing","progressDetail":{"current":1428239,"total":2986836},"progress":"[=======================\u003e                           ]  1.428MB/2.987MB","id":"a43c37333595"}
{"status":"Pushing","progressDetail":{"current":2741798,"total":2986836},"progress":"[=============================================\u003e     ]  2.742MB/2.987MB","id":"a43c37333595"}
{"status":"Pushing","progressDetail":{"current":3273728,"total":2986836},"progress":"[==================================================\u003e]  3.274MB","id":"a43c37333595"}
{"status":"Pushed","progressDetail":{},"id":"ddfab15718d9"}
{"status":"Pushed","progressDetail":{},"id":"0490c7aeabf0"}
{"status":"Pushed","progressDetail":{},"id":"57c7da5da29e"}
{"status":"Pushing","progressDetail":{"current":512,"total":39},"progress":"[==================================================\u003e]     512B","id":"479d29ce9800"}
{"status":"Pushing","progressDetail":{"current":2560,"total":39},"progress":"[==================================================\u003e]   2.56kB","id":"479d29ce9800"}
{"status":"Pushing","progressDetail":{"current":512,"total":19820},"progress":"[=\u003e                                                 ]     512B/19.82kB","id":"f4bfb05d8c99"}
{"status":"Pushing","progressDetail":{"current":28160,"total":19820},"progress":"[==================================================\u003e]  28.16kB","id":"f4bfb05d8c99"}
{"status":"Pushing","progressDetail":{"current":413696,"total":38721310},"progress":"[\u003e                                                  ]  413.7kB/38.72MB","id":"f634932f0fdf"}
{"status":"Pushed","progressDetail":{},"id":"ecbb601dd983"}
{"status":"Pushed","progressDetail":{},"id":"a43c37333595"}
{"status":"Pushing","progressDetail":{"current":2009600,"total":38721310},"progress":"[==\u003e                                                ]   2.01MB/38.72MB","id":"f634932f0fdf"}
{"status":"Pushing","progressDetail":{"current":3582464,"total":38721310},"progress":"[====\u003e                                              ]  3.582MB/38.72MB","id":"f634932f0fdf"}
{"status":"Pushing","progressDetail":{"current":5180416,"total":38721310},"progress":"[======\u003e                                            ]   5.18MB/38.72MB","id":"f634932f0fdf"}
{"status":"Pushing","progressDetail":{"current":6753280,"total":38721310},"progress":"[========\u003e                                          ]  6.753MB/38.72MB","id":"f634932f0fdf"}
{"status":"Pushing","progressDetail":{"current":8331264,"total":38721310},"progress":"[==========\u003e                                        ]  8.331MB/38.72MB","id":"f634932f0fdf"}
{"status":"Pushing","progressDetail":{"current":10291712,"total":38721310},"progress":"[=============\u003e                                     ]  10.29MB/38.72MB","id":"f634932f0fdf"}
{"status":"Pushing","progressDetail":{"current":12274176,"total":38721310},"progress":"[===============\u003e                                   ]  12.27MB/38.72MB","id":"f634932f0fdf"}
{"status":"Pushing","progressDetail":{"current":14240256,"total":38721310},"progress":"[==================\u003e                                ]  14.24MB/38.72MB","id":"f634932f0fdf"}
{"status":"Mounted from deeplearning-platform-release/tf2-gpu.2-6","progressDetail":{},"id":"e5a69fe43a97"}
{"status":"Pushing","progressDetail":{"current":16206336,"total":38721310},"progress":"[====================\u003e                              ]  16.21MB/38.72MB","id":"f634932f0fdf"}
{"status":"Pushing","progressDetail":{"current":17779200,"total":38721310},"progress":"[======================\u003e                            ]  17.78MB/38.72MB","id":"f634932f0fdf"}
{"status":"Pushing","progressDetail":{"current":19745280,"total":38721310},"progress":"[=========================\u003e                         ]  19.75MB/38.72MB","id":"f634932f0fdf"}
{"status":"Mounted from deeplearning-platform-release/tf2-gpu.2-6","progressDetail":{},"id":"ed55b6190435"}
{"status":"Pushing","progressDetail":{"current":21318144,"total":38721310},"progress":"[===========================\u003e                       ]  21.32MB/38.72MB","id":"f634932f0fdf"}
{"status":"Pushing","progressDetail":{"current":23284224,"total":38721310},"progress":"[==============================\u003e                    ]  23.28MB/38.72MB","id":"f634932f0fdf"}
{"status":"Pushing","progressDetail":{"current":25250304,"total":38721310},"progress":"[================================\u003e                  ]  25.25MB/38.72MB","id":"f634932f0fdf"}
{"status":"Pushing","progressDetail":{"current":27216384,"total":38721310},"progress":"[===================================\u003e               ]  27.22MB/38.72MB","id":"f634932f0fdf"}
{"status":"Pushing","progressDetail":{"current":29182464,"total":38721310},"progress":"[=====================================\u003e             ]  29.18MB/38.72MB","id":"f634932f0fdf"}
{"status":"Pushing","progressDetail":{"current":30768640,"total":38721310},"progress":"[=======================================\u003e           ]  30.77MB/38.72MB","id":"f634932f0fdf"}
{"status":"Pushing","progressDetail":{"current":32359424,"total":38721310},"progress":"[=========================================\u003e         ]  32.36MB/38.72MB","id":"f634932f0fdf"}
{"status":"Mounted from deeplearning-platform-release/tf2-gpu.2-6","progressDetail":{},"id":"87ec19f85372"}
{"status":"Pushing","progressDetail":{"current":33952256,"total":38721310},"progress":"[===========================================\u003e       ]  33.95MB/38.72MB","id":"f634932f0fdf"}
{"status":"Pushing","progressDetail":{"current":35525120,"total":38721310},"progress":"[=============================================\u003e     ]  35.53MB/38.72MB","id":"f634932f0fdf"}
{"status":"Pushing","progressDetail":{"current":37110272,"total":38721310},"progress":"[===============================================\u003e   ]  37.11MB/38.72MB","id":"f634932f0fdf"}
{"status":"Pushing","progressDetail":{"current":38809600,"total":38721310},"progress":"[==================================================\u003e]  38.81MB","id":"f634932f0fdf"}
{"status":"Mounted from deeplearning-platform-release/tf2-gpu.2-6","progressDetail":{},"id":"8c3b041fd87c"}
{"status":"Pushed","progressDetail":{},"id":"479d29ce9800"}
{"status":"Pushed","progressDetail":{},"id":"f4bfb05d8c99"}
{"status":"Mounted from deeplearning-platform-release/tf2-gpu.2-6","progressDetail":{},"id":"0ac428b7127a"}
{"status":"Layer already exists","progressDetail":{},"id":"b3ab95a574c8"}
{"status":"Layer already exists","progressDetail":{},"id":"d1b010151b48"}
{"status":"Mounted from deeplearning-platform-release/tf2-gpu.2-6","progressDetail":{},"id":"370688903f01"}
{"status":"Layer already exists","progressDetail":{},"id":"b80bc089358e"}
{"status":"Layer already exists","progressDetail":{},"id":"11bc9b36546a"}
{"status":"Layer already exists","progressDetail":{},"id":"fffe44800c74"}
{"status":"Mounted from deeplearning-platform-release/tf2-gpu.2-6","progressDetail":{},"id":"76d62c4c37cc"}
{"status":"Layer already exists","progressDetail":{},"id":"1175e7a0a8e0"}
{"status":"Layer already exists","progressDetail":{},"id":"992f2c95dad2"}
{"status":"Layer already exists","progressDetail":{},"id":"91b2ad1e9845"}
{"status":"Layer already exists","progressDetail":{},"id":"178f9673d3c0"}
{"status":"Layer already exists","progressDetail":{},"id":"3298591378da"}
{"status":"Layer already exists","progressDetail":{},"id":"963f45082214"}
{"status":"Layer already exists","progressDetail":{},"id":"59edb8a95299"}
{"status":"Layer already exists","progressDetail":{},"id":"b79b505a5328"}
{"status":"Layer already exists","progressDetail":{},"id":"6083edd74f0c"}
{"status":"Layer already exists","progressDetail":{},"id":"4236d5cafaa0"}
{"status":"Layer already exists","progressDetail":{},"id":"da29c29e84ca"}
{"status":"Layer already exists","progressDetail":{},"id":"924dcf5e7282"}
{"status":"Layer already exists","progressDetail":{},"id":"1526a09df7d6"}
{"status":"Layer already exists","progressDetail":{},"id":"f35a9ab279de"}
{"status":"Layer already exists","progressDetail":{},"id":"6cd83fbc36a4"}
{"status":"Layer already exists","progressDetail":{},"id":"a7a59823f7fd"}
{"status":"Layer already exists","progressDetail":{},"id":"a86b3e862105"}
{"status":"Layer already exists","progressDetail":{},"id":"9ad794ce6bea"}
{"status":"Layer already exists","progressDetail":{},"id":"9f54eef41275"}
{"status":"Layer already exists","progressDetail":{},"id":"d533033842c0"}
{"status":"Pushed","progressDetail":{},"id":"f634932f0fdf"}
{"status":"20220331-102410-116512: digest: sha256:c059dbb502a1b915aef8b10e0e1dd4e9a241d23adb19431ffad552a4edfeb3b9 size: 9127"}
{"progressDetail":{},"aux":{"Tag":"20220331-102410-116512","Digest":"sha256:c059dbb502a1b915aef8b10e0e1dd4e9a241d23adb19431ffad552a4edfeb3b9","Size":9127}}

Your image URI is: gcr.io/***/cifar10_tensorflow:20220331-102410-116512
E0331 10:25:27.511640750 3964344 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0331 10:25:29.811606183 3964345 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0331 10:25:30.695179886 3964345 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0331 10:25:31.677614770 3964345 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0331 10:25:32.550932107 3964345 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0331 10:25:34.109284446 3964345 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0331 10:25:35.012729765 3964345 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
W0331 10:25:36.525629 139812189455936 http.py:139] Encountered 403 Forbidden with reason "PERMISSION_DENIED"
I0331 10:25:36.528207 139812155360832 base.py:80] Creating CustomJob
I0331 10:25:37.428091 139812155360832 base.py:127] CustomJob created. Resource name: projects/****/locations/us-central1/customJobs/1290022358253305856
I0331 10:25:37.428335 139812155360832 base.py:128] To use this CustomJob in another session:
I0331 10:25:37.428413 139812155360832 base.py:129] custom_job = aiplatform.CustomJob.get('projects/***/locations/us-central1/customJobs/1290022358253305856')
I0331 10:25:37.429027 139812155360832 jobs.py:1412] View Custom Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/1290022358253305856?project=***
I0331 10:25:37.429559 139812155360832 jobs.py:1415] View Tensorboard:
https://us-central1.tensorboard.googleusercontent.com/experiment/projects+***+locations+us-central1+tensorboards+2824407877244944384+experiments+1290022358253305856
Job launched at: https://console.cloud.google.com/ai/platform/locations/us-central1/training/1290022358253305856?project=***
E0331 10:25:37.646365894 3964345 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0331 10:25:38.566072246 3964345 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0331 10:25:39.474203776 3964345 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0331 10:25:40.424329133 3964345 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0331 10:25:41.956427682 3964345 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0331 10:25:42.839551026 3964345 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
I0331 10:25:43.589789 139812155360832 jobs.py:178] CustomJob projects/***/locations/us-central1/customJobs/1290022358253305856 current state:
JobState.JOB_STATE_PENDING
W0331 10:25:44.369634 139812189455936 http.py:139] Encountered 403 Forbidden with reason "PERMISSION_DENIED"
I0331 10:25:44.371166 139812138391104 base.py:80] Creating CustomJob
I0331 10:25:45.369434 139812138391104 base.py:127] CustomJob created. Resource name: projects/***/locations/us-central1/customJobs/7194241469736026112
I0331 10:25:45.369657 139812138391104 base.py:128] To use this CustomJob in another session:
I0331 10:25:45.369731 139812138391104 base.py:129] custom_job = aiplatform.CustomJob.get('projects/***/locations/us-central1/customJobs/7194241469736026112')
I0331 10:25:45.369875 139812138391104 jobs.py:1412] View Custom Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/7194241469736026112?project=***
I0331 10:25:45.370020 139812138391104 jobs.py:1415] View Tensorboard:
https://us-central1.tensorboard.googleusercontent.com/experiment/projects+***+locations+us-central1+tensorboards+2824407877244944384+experiments+7194241469736026112
Job launched at: https://console.cloud.google.com/ai/platform/locations/us-central1/training/7194241469736026112?project=***
Waiting for local jobs to complete. Press Ctrl+C to terminate them and exit
I0331 10:25:51.482136 139812138391104 jobs.py:178] CustomJob projects/***/locations/us-central1/customJobs/7194241469736026112 current state:
JobState.JOB_STATE_PENDING
I0331 10:25:54.744579 139812155360832 jobs.py:178] CustomJob projects/***/locations/us-central1/customJobs/1290022358253305856 current state:
JobState.JOB_STATE_PENDING
I0331 10:26:02.614101 139812138391104 jobs.py:178] CustomJob projects/***/locations/us-central1/customJobs/7194241469736026112 current state:
JobState.JOB_STATE_PENDING
I0331 10:26:17.016368 139812155360832 jobs.py:178] CustomJob projects/***/locations/us-central1/customJobs/1290022358253305856 current state:
JobState.JOB_STATE_PENDING
I0331 10:26:24.918474 139812138391104 jobs.py:178] CustomJob projects/***/locations/us-central1/customJobs/7194241469736026112 current state:
JobState.JOB_STATE_PENDING
I0331 10:27:01.445312 139812155360832 jobs.py:178] CustomJob projects/***/locations/us-central1/customJobs/1290022358253305856 current state:
JobState.JOB_STATE_PENDING
I0331 10:27:08.947669 139812138391104 jobs.py:178] CustomJob projects/***/locations/us-central1/customJobs/7194241469736026112 current state:
JobState.JOB_STATE_PENDING
I0331 10:28:24.716295 139812155360832 jobs.py:178] CustomJob projects/***/locations/us-central1/customJobs/1290022358253305856 current state:
JobState.JOB_STATE_PENDING
I0331 10:28:31.778189 139812138391104 jobs.py:178] CustomJob projects/***/locations/us-central1/customJobs/7194241469736026112 current state:
JobState.JOB_STATE_PENDING
I0331 10:30:33.545330 139812138391104 jobs.py:1127] CustomJob projects/***/locations/us-central1/customJobs/7194241469736026112 access the interactive shell terminals for the custom job:
workerpool0-0:
cb68e24fff4cd7b5-dot-us-central1.aiplatform-training.googleusercontent.com
I0331 10:30:42.651297 139812155360832 jobs.py:1127] CustomJob projects/***/locations/us-central1/customJobs/1290022358253305856 access the interactive shell terminals for the custom job:
workerpool0-0:
8dc25320200e3bcd-dot-us-central1.aiplatform-training.googleusercontent.com
I0331 10:31:09.588982 139812155360832 jobs.py:178] CustomJob projects/***/locations/us-central1/customJobs/1290022358253305856 current state:
JobState.JOB_STATE_RUNNING
I0331 10:31:12.343992 139812138391104 jobs.py:178] CustomJob projects/***/locations/us-central1/customJobs/7194241469736026112 current state:
JobState.JOB_STATE_RUNNING

Did you wait for the experiment to start running? TensorBoard may not be available until the job starts, which may take some time because its initial state will be Pending.

The log you are seeing is pulled from the python-aiplatform code:
https://github.com/googleapis/python-aiplatform/blob/v1.10.0/google/cloud/aiplatform/jobs.py#L1415

Thanks Andrew.

I wasn't sure If I did that so I started a fresh run and continuously hit refresh page before, during and up to 3 minutes after the job execution. In all cases I got the Not found: TensorboardExperiment page.

Is there a special service that I need to enable or something in order to start using tensorboard on vertexAI ?
If this discussion is irrelevant to this repo please guide me to the correct channel to ask on this.

Could you refer to this page to see whether a TensorBoard exists under your Experiments tab?

https://cloud.google.com/vertex-ai/docs/experiments/tensorboard-overview#view_a_experiment

Yes the OPEN TENSORBOARD button shows in the custom job details page from training -> custom jobs. And a tensorboard instance shows in experiments -> tensorboard instances with the correct name as that in the script.
However that button leads to the Not found: TensorboardExperiment page.

I don't think there's a problem with XManager's library or the aiplatform library. I would refer to the official Vertex support.

https://cloud.google.com/vertex-ai/docs/support/getting-support

You can also run Tensorboard from your local machine and pass the proper gs:// directory.

Thanks,

Using tensorboard --logdir gs://... would be a temporary solution for my case. I'll refer to Vertex support for further assistance.