livebook-dev / req_bigquery

Conveniences for querying Google BigQuery with Req

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bigquery timeouts and returns only job_id

matreyes opened this issue · comments

When your query takes too long to run, there is a timeout on Bigquery, and the query returns only the job_id.

In my own project (based in req_bigquery) I've replaced the first call, and always create a job (instead direct query), and then I call the job recursively to get the query results:

    def query(query, project_id, goth_name, finch_name, retries \\ 0)
    def query(_query, _project_id, _goth_name, _finch_name, 3), do: {:error, :too_many_retries}

    def query(query, project_id, goth_name, finch_name, retries) do
      token = Goth.fetch!(goth_name).token
      headers = [{"Authorization", "Bearer #{token}"}, {"Content-Type", "application/json"}]

      body =
        %{
          "configuration" => %{
            "query" => %{
              "query" => query,
              "useLegacySql" => false
            },
            "labels" => %{"service" => @service_name}
          }
        }
        |> Jason.encode!()

      # jobs insert API
      case Finch.build(:post, "#{@bigquery_base_url}/projects/#{project_id}/jobs", headers, body)
          |> Finch.request(finch_name) do
        {:ok, %{status: 200, body: body}} ->
          job = Jason.decode!(body)
          job_id = job["jobReference"]["jobId"]
          get_job(project_id, job_id, headers, finch_name)

        {:ok, %{status: error_code, body: body} = resp} ->
          Logger.error("Couldn't create a job. Error code: #{error_code}.")
          Logger.error(inspect(resp))
          {:error, body}

        {:error, e} ->
          Logger.error("Error creating bigquery job: #{inspect(e)}")
          :timer.sleep(1000)
          query(query, project_id, goth_name, finch_name, retries + 1)
      end
    end

    defp get_job(project_id, job_id, headers, finch_name) do
      Logger.debug("getting job")
      job_url = "#{@bigquery_base_url}/projects/#{project_id}/jobs/#{job_id}"
      req = Finch.build(:get, job_url, headers)

      with {:ok, %Response{status: 200, body: body}} <- Finch.request(req, finch_name),
          %{"status" => %{"state" => "DONE"}} = job_info <- Jason.decode!(body) do
        build_result(project_id, job_id, headers, job_info, finch_name)
      else
        _ ->
          :timer.sleep(@check_interval)
          get_job(project_id, job_id, headers, finch_name)
      end
    end

I really don't know if something like that would work for req_bigquery / livebook

Best,

@matreyes I hit this issue as well. I added a timeoutMs parameter (PR #28) but this could still be a problem.

Q for others here, what is the best way to handle async queries?

Yeah, I tried with timeout, but it still returned false, as documentation says

However, the call is not guaranteed to wait for the specified timeout; it typically returns after around 200 seconds (200,000 milliseconds), even if the query is not complete