ConnorJL / GPT2

An implementation of training for GPT2, supports TPUs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

when reading metadata of gs://openwebtext/stuff/encoder/encoder.json

makamkkumar opened this issue · comments

Error coming while executing the command

$ python3 main.py --model 345M.json --predict_text "Hello World. Hello there! My name"
The output is below
{'n_head': 16, 'encoder_path': 'gs://openwebtext/stuff/encoder', 'n_vocab': 50257, 'embed_dropout': 0.1, 'lr': 0.00025, 'warmup_steps': 2000, 'weight_decay': 0.01, 'beta1': 0.9, 'beta2': 0.98, 'epsilon': 1e-09, 'opt_name': 'adam', 'train_batch_size': 8, 'attn_dropout': 0.1, 'train_steps': 10000, 'eval_steps': 10, 'max_steps': 500000, 'data_path': 'gs://connors-datasets/openwebtext/', 'res_dropout': 0.1, 'predict_batch_size': 8, 'eval_batch_size': 8, 'iterations': 500, 'n_embd': 1024, 'input': 'openwebtext', 'model': 'GPT2', 'model_path': 'gs://connors-models/GPT2-345M', 'n_ctx': 1024, 'predict_path': 'logs/predictions.txt', 'n_layer': 24, 'scale_by_depth': True, 'scale_by_in': True, 'use_tpu': False, 'precision': 'float32'}
2019-10-21 12:38:38.103626: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 0.159809 seconds (attempt 1 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'
2019-10-21 12:38:38.272828: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 0.053047 seconds (attempt 2 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'
2019-10-21 12:38:38.370688: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 0.050504 seconds (attempt 3 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'
2019-10-21 12:38:38.433094: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 0.564422 seconds (attempt 4 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'
2019-10-21 12:38:39.022315: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 0.256678 seconds (attempt 5 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'
2019-10-21 12:38:39.300586: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 1.24113 seconds (attempt 6 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'
2019-10-21 12:38:40.675821: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 1.13431 seconds (attempt 7 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'
2019-10-21 12:38:41.867547: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 1.20263 seconds (attempt 8 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'
2019-10-21 12:38:43.087045: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 1.05564 seconds (attempt 9 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'
2019-10-21 12:38:44.151391: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 1.43831 seconds (attempt 10 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'
2019-10-21 12:38:45.596157: W tensorflow/core/platform/cloud/google_auth_provider.cc:157] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "Not found: Could not locate the credentials file.". Retrieving token from GCE failed with "Aborted: All 10 retry attempts failed. The last failure: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'".
Traceback (most recent call last):
File "main.py", line 118, in
enc = encoder.get_encoder(params["encoder_path"])
File "/home/kiran1/KiranResearch/TextSummerization/GPT2/models/gpt2/encoder.py", line 111, in get_encoder
encoder = json.load(f)
File "/home/kiran1/anaconda3/envs/tf_gpu/lib/python3.6/json/init.py", line 296, in load
return loads(fp.read(),
File "/home/kiran1/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 128, in read
length = self.size() - self.tell()
File "/home/kiran1/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 104, in size
return stat(self.__name).length
File "/home/kiran1/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 735, in stat
return stat_v2(filename)
File "/home/kiran1/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 754, in stat_v2
return file_statistics
File "/home/kiran1/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.PermissionDeniedError: Error executing an HTTP request: HTTP response code 401 with body '{
"error": {
"code": 401,
"message": "Anonymous caller does not have storage.objects.get access to openwebtext/stuff/encoder/encoder.json.",
"errors": [
{
"message": "Anonymous caller does not have storage.objects.get access to openwebtext/stuff/encoder/encoder.json.",
"domain": "global",
"reason": "required",
"locationType": "header",
"location": "Authorization"
}
]
}
}
'
when reading metadata of gs://openwebtext/stuff/encoder/encoder.json

The paths are still pointing towards my (private) google bucket. You need to download the encoder/model and put them somewhere you have access to.