Project Staffing Deployment fails when DataBricks runtime environment is temporary unavailable
dserdyuk opened this issue · comments
Denis Serdyuk commented
Describe the bug
Deployment script fails sometimes with following error:
Provisioning ADB cluster ...
Creating a new cluster default-gdc-cluster
Databricks cluster initialization has failed
400 Client Error: Bad Request for url: https://adb-2879709616442315.15.azuredatabricks.net/api/2.0/clusters/create
Response from server:
{ 'error_code': 'BAD_REQUEST',
'message': 'Current organization 2879709616442315 does not have any '
'associated worker environments'}
Traceback (most recent call last):
File "/home/vsts/.gdc-env/lib/python3.8/site-packages/databricks_cli/sdk/api_client.py", line 121, in perform_query
resp.raise_for_status()
File "/home/vsts/.gdc-env/lib/python3.8/site-packages/requests/models.py", line 943, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://adb-2879709616442315.15.azuredatabricks.net/api/2.0/clusters/create
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "post-deployment.py", line 97, in <module>
raise adb_err
File "post-deployment.py", line 87, in <module>
However subsequent rerun of the script works for the same organization. Probably it's related to eventual consistent nature of of ADB metadata database..
Expected behavior
ADB Cluster provisioning should be more resilient to temporal issues like that and retry cluster creation.