asset-inventory : job stay running for many hours
sdenef-adeo opened this issue · comments
Hello,
We faced an issue yesterday with a Dataflow job using the asset-inventory feature.
The job stay running for more than 10 hours. Usually it takes 2h to run.
We canceled the job, and run it again. It finished successfully in 2h.
We got many logs like the following (we didn't have them usually but I can't be sure it's the root cause)
Operation ongoing for over 300.19 seconds in state process-msecs in step s8 . Current Traceback:
File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.7/site-packages/dataflow_worker/start.py", line 144, in <module>
main()
File "/usr/local/lib/python3.7/site-packages/dataflow_worker/start.py", line 140, in main
batchworker.BatchWorker(properties, sdk_pipeline_options).run()
File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 844, in run
deferred_exception_details=deferred_exception_details)
File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 649, in do_work
work_executor.execute()
File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 179, in execute
op.start()
File "./asset_inventory/import_pipeline.py", line 150, in add_input
resource_schema = self.element_to_schema(element)
File "./asset_inventory/import_pipeline.py", line 147, in element_to_schema
'iam_policy' in element)
File "/usr/local/lib/python3.7/site-packages/asset_inventory/api_schema.py", line 451, in bigquery_schema_for_resource
discovery_doc_url)
File "/usr/local/lib/python3.7/site-packages/asset_inventory/api_schema.py", line 84, in _get_discovery_document_versions
dd = cls._get_discovery_document(dd_url)
File "/usr/local/lib/python3.7/site-packages/asset_inventory/api_schema.py", line 44, in _get_discovery_document
response = requests.get(dd_url)
File "/usr/local/lib/python3.7/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 530, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 643, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 677, in urlopen
chunked=chunked,
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 421, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.7/http/client.py", line 1369, in getresponse
response.begin()
File "/usr/local/lib/python3.7/http/client.py", line 310, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.7/http/client.py", line 271, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/local/lib/python3.7/socket.py", line 589, in readinto
return self._sock.recv_into(b)
File "/usr/local/lib/python3.7/ssl.py", line 1071, in recv_into
return self.read(nbytes, buffer)
File "/usr/local/lib/python3.7/ssl.py", line 929, in read
return self._sslobj.read(len, buffer)
Are you able to confirm it's the root cause?
If it is, are you able to explain why it keep our job running for many hours yesterday?
Do you think you can provide any update to the asset-inventory feature to avoid this kind of problem? For example to be able to make the job failed after X minutes to avoid this kind of situation?
Best regards,
Sébastien