When curl is told to follow redirects, response body includes headers
dmbaturin opened this issue · comments
If you use ~args:["-L"]
to make curl follow redirects, response body mistakenly includes headers of the final reply.
# let http_get url =
match Curly.(run ~args:["-L"] (Request.make ~url:url ~meth:`GET ())) with
| Ok x -> x.Curly.Response.body
| Error e -> Format.printf "%a" Curly.Error.pp e; failwith "Failed to fetch the feed"
;;
val http_get : string -> string = <fun>
(* HTTP → HTTPS 301 redirect *)
# http_get "http://baturin.org/blog/atom-ocaml.xml" ;;
- : string =
"HTTP/2 200 \r\ndate: Sat, 29 Aug 2020 21:46:41 GMT\r\nserver: Apache/2.4.43 (Fedora) OpenSSL/1.1.1g\r\nlast-modified: Fri, 28 Aug 2020 19:23:29 GMT\r\netag: \"22aa-5adf4fcd7ab93\"\r\naccept-ranges: bytes\r\ncontent-length: 8874\r\ncontent-type: text/xml\r\n\r\n<?xml version='1.0' encoding='UTF-8'?>\n<feed xmlns=\"http:/"... (* string length 9115; truncated *)
(* No redirect *)
# http_get "https://baturin.org/blog/atom-ocaml.xml" ;;
- : string =
"<?xml version='1.0' encoding='UTF-8'?>\n<feed xmlns=\"http://www.w3.org/2005/Atom\" xml:lang=\"en\">\n <id>https://baturin.org/blog/atom.xml</id>\n <title>Daniil Baturin's blog</title>\n <updated>2020-08-28T19:23:29.169634+00:00</updated>\n <author>\n <name>Daniil Baturin</name>\n <email>daniil+webs"... (* string length 8874; truncated *)
Thanks for the report. Would you like to try fixing this?
Sure. I mostly wanted your confirmation that you consider this a bug to be fixed, and maybe some suggestions where to start looking. I'm happy to help.
This is indeed an issue. I've googled around and I don't see a decent way to have curl only output the final response.
It seems like you'll need to do some parsing hackery to get rid of the headers from the redirect response. I suggest that we:
- Use
-D <tmp_file>
to save the response body. This will make our code less fragile. - Add a
?follow_redirects
argument that will be translated to-L
. - Change the header parser to only keep the last set of headers.
What do you think?
I'm also wondering if following redirects is the better default. @c-cube wdyt?
Done in #9