cognitect-labs / aws-api

AWS, data driven

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Textract's ProvisionedThroughputExceededException gets marked as :cognitect.anomalies/incorrect

lvh opened this issue · comments

commented

Dependencies

        com.cognitect.aws/api       {:mvn/version "0.8.539"}
        com.cognitect.aws/endpoints {:mvn/version "1.1.12.110"}
        com.cognitect.aws/textract {:mvn/version "814.2.1023.0"}

Description with failing test case

Calling AWS Textract's :StartExpenseAnalysis too much will produce this error:

{:__type "ProvisionedThroughputExceededException",
 :Message "Provisioned rate exceeded",
 :cognitect.anomalies/category :cognitect.anomalies/incorrect}

It gets marked as :cognitect.anomalies/incorrect, presumably because the HTTP response code is 400. I tested this as follows:

  (def seen-responses (atom []))
  (defn my-retriable?
    [http-response]
    (swap! seen-responses conj http-response)
    (awsr/default-retriable? http-response))

  ;; call api a bunch of times until I see exceptions

  (->> @seen-responses (filter (comp #{"Provisioned rate exceeded"} :Message)) first prn)

The metadata of that http-response is more instructive:

{:http-request
 {:request-method :post,
  :scheme :https,
  :server-port 443,
  :uri "/",
  :headers
  {"x-amz-date" "20220101T223307Z",
   "x-amz-target" "Textract.StartExpenseAnalysis",
   "content-type" "application/x-amz-json-1.1",
   "host" "textract.us-east-2.amazonaws.com",
   "x-amz-security-token"   "REDACTED",
   "authorization"   "REDACTED"},
  :body
  #object[java.nio.HeapByteBuffer 0x51e95d3c "java.nio.HeapByteBuffer[pos=0 lim=235 cap=235]"],
  :server-name "textract.us-east-2.amazonaws.com"},
 :http-response
 {:status 400,  ;; ⚠️
  :headers
  {"x-amzn-requestid" "REDACTED",
   "connection" "close",
   "content-length" "89",
   "date" "Sat, 01 Jan 2022 22:33:07 GMT",
   "content-type" "application/x-amz-json-1.1"},
  :body
  #object[java.io.BufferedInputStream 0x102b4e73 "java.io.BufferedInputStream@102b4e73"]}}

Note that the status code is 400 even though the caller didn't do anything wrong here.

I understand if you don't really want to maintain a bunch of exceptions for faulty signaling. That is is de facto what I'm doing:

(defn better-retriable?
    [http-response]
    (cond-> http-response
      ;; AWS Textract
      (-> http-response :Message (= "Provisioned rate exceeded"))
      (assoc :cognitect.anomalies/category :cognitect.anomalies/busy)

      true awsr/default-retriable?))

Which I appreciate is not very precise but at least "Provisioned rate exceeded" is pretty unambiguous looking :) And is presumably not even really the right place to do that, but it is a convenient workaround location.

Hi @lvh!

If there is a standard idiom for misreporting throttling across a bunch of AWS services it would make sense to support it, but if this is a one-off defect in Textract we should ask them to fix it there. WDYT?

commented

I have no preference, I figured even a closed ticket would at least be searchable for the next person to step in this :) I also have no data on whether this is really just Textract (I'll happily believe it) or more services, and whether AWS would be inclined to fix it (I've had great results in getting them to update docs to match reality, but changing even buggy service behavior not so much).