Silent failure in XML parsing returns empty result from S3
bsless opened this issue · comments
Thank you for your interest in helping to improve Cognitect's aws-api!
Dependencies
:deps {org.clojure/clojure {:mvn/version "1.10.3"}
techascent/tech.ml.dataset {:mvn/version "6.053"}
;; techascent/tech.ml.dataset {:mvn/version "5.00"}
com.cognitect.aws/api {:mvn/version "0.8.539"}
com.cognitect.aws/endpoints {:mvn/version "1.1.12.110"}
com.cognitect.aws/s3 {:mvn/version "814.2.991.0"}}
Description with failing test case
When I include techascent/tech.ml.dataset
an unreported conflict in xml dependencies causes a silent failure in the client and s3 operations fail
Also see
- reproduction repository: https://github.com/bsless/tmd-s3-repro
- techascent/tech.ml.dataset#283
Stack traces
Silent failure
I boiled things down in that issue to these dependencies:
{:paths ["src" "resources"]
:deps {org.clojure/clojure {:mvn/version "1.10.3"}
com.fasterxml/aalto-xml {:mvn/version "1.3.1"}
com.cognitect.aws/api {:mvn/version "0.8.539"}
com.cognitect.aws/endpoints {:mvn/version "1.1.12.110"}
com.cognitect.aws/s3 {:mvn/version "814.2.991.0"}}
:aliases
{:test
{:extra-paths ["test"]
:extra-deps {org.clojure/test.check {:mvn/version "1.1.0"}
io.github.cognitect-labs/test-runner
{:git/tag "v0.5.0" :git/sha "48c3c67"}}}
:build {:deps {io.github.seancorfield/build-clj
{:git/tag "v0.6.2" :git/sha "97c275a"}}
:ns-default build}}}
I'm seeing this same behavior with sts
as well. Excluding com.fasterxml/aalto-xml
fixed the problem for me too.
I did some more digging and it's (kinda expectedly) a problem in the data.xml
library:
;; deps.edn
{:paths ["src"]
:deps
{org.clojure/clojure {:mvn/version "1.11.1"}
org.clojure/data.xml {:mvn/version "0.2.0-alpha6"}}
:aliases
{:with-aalto
{:extra-deps {com.fasterxml/aalto-xml {:mvn/version "1.3.2"}}}
:repo
{:exec-fn repo/pprint}}}
❯ clj -X:repo
{:tag :bar,
:attrs {:xmlns "https://sts.amazonaws.com/doc/2011-06-15/"},
:content ({:tag :foo, :attrs {}, :content ("hello, world")})}❯
❯ clj -A:with-aalto -X:repo
{:tag :xmlns.https%3A%2F%2Fsts.amazonaws.com%2Fdoc%2F2011-06-15%2F/bar,
:attrs {},
:content
({:tag
:xmlns.https%3A%2F%2Fsts.amazonaws.com%2Fdoc%2F2011-06-15%2F/foo,
:attrs {},
:content ("hello, world")})}
a slim repo can be found here
Some context
- XMLInputFactory docs explicitly states that only the true setting must be supported for IS_NAMESPACE_AWARE
- aalto, following the Java doc, does not allow setting it to false
- this means that even if we set it IS_NAMESPACE_AWARE to false, the aalto XMLInputFactory implementation is going to result in tags prefixed with namespaces if there is a namespace in the data
- AWS sends XML payloads with namespaces in them, even though we don't need them
This doesn't leave great options for clojure.data.xml or aws-api, but we're looking into them.
Fixed in org.clojure/data.xml-0.2.0-alpha8 and aws-api-0.8.596
Fixed by upstream change in data.xml.