cognitect-labs / aws-api

AWS, data driven

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Silent failure in XML parsing returns empty result from S3

bsless opened this issue · comments

Thank you for your interest in helping to improve Cognitect's aws-api!

Dependencies

 :deps {org.clojure/clojure {:mvn/version "1.10.3"}
        techascent/tech.ml.dataset {:mvn/version "6.053"}
        ;; techascent/tech.ml.dataset {:mvn/version "5.00"}
        com.cognitect.aws/api       {:mvn/version "0.8.539"}
        com.cognitect.aws/endpoints {:mvn/version "1.1.12.110"}
        com.cognitect.aws/s3        {:mvn/version "814.2.991.0"}}

Description with failing test case

When I include techascent/tech.ml.dataset an unreported conflict in xml dependencies causes a silent failure in the client and s3 operations fail

Also see

Stack traces

Silent failure

I boiled things down in that issue to these dependencies:

{:paths ["src" "resources"]
 :deps {org.clojure/clojure {:mvn/version "1.10.3"}
        com.fasterxml/aalto-xml {:mvn/version "1.3.1"}
        com.cognitect.aws/api       {:mvn/version "0.8.539"}
        com.cognitect.aws/endpoints {:mvn/version "1.1.12.110"}
        com.cognitect.aws/s3        {:mvn/version "814.2.991.0"}}
 :aliases
 {:test
  {:extra-paths ["test"]
   :extra-deps {org.clojure/test.check {:mvn/version "1.1.0"}
                io.github.cognitect-labs/test-runner
                {:git/tag "v0.5.0" :git/sha "48c3c67"}}}
  :build {:deps {io.github.seancorfield/build-clj
                 {:git/tag "v0.6.2" :git/sha "97c275a"}}
          :ns-default build}}}

I'm seeing this same behavior with sts as well. Excluding com.fasterxml/aalto-xml fixed the problem for me too.

I did some more digging and it's (kinda expectedly) a problem in the data.xml library:

;; deps.edn
{:paths ["src"]

 :deps
 {org.clojure/clojure {:mvn/version "1.11.1"}
  org.clojure/data.xml {:mvn/version "0.2.0-alpha6"}}

 :aliases
 {:with-aalto
  {:extra-deps {com.fasterxml/aalto-xml {:mvn/version "1.3.2"}}}

  :repo
  {:exec-fn repo/pprint}}}

❯ clj -X:repo
{:tag :bar,
 :attrs {:xmlns "https://sts.amazonaws.com/doc/2011-06-15/"},
 :content ({:tag :foo, :attrs {}, :content ("hello, world")})}❯

❯ clj -A:with-aalto -X:repo
{:tag :xmlns.https%3A%2F%2Fsts.amazonaws.com%2Fdoc%2F2011-06-15%2F/bar,
 :attrs {},
 :content
 ({:tag
   :xmlns.https%3A%2F%2Fsts.amazonaws.com%2Fdoc%2F2011-06-15%2F/foo,
   :attrs {},
   :content ("hello, world")})}

a slim repo can be found here

Some context

  • XMLInputFactory docs explicitly states that only the true setting must be supported for IS_NAMESPACE_AWARE
  • aalto, following the Java doc, does not allow setting it to false
    • this means that even if we set it IS_NAMESPACE_AWARE to false, the aalto XMLInputFactory implementation is going to result in tags prefixed with namespaces if there is a namespace in the data
  • AWS sends XML payloads with namespaces in them, even though we don't need them

This doesn't leave great options for clojure.data.xml or aws-api, but we're looking into them.

Fixed in org.clojure/data.xml-0.2.0-alpha8 and aws-api-0.8.596

Fixed by upstream change in data.xml.