rgrinberg / curly

Command line curl wrapper for OCaml

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

When curl is told to follow redirects, response body includes headers

dmbaturin opened this issue · comments

If you use ~args:["-L"] to make curl follow redirects, response body mistakenly includes headers of the final reply.

# let http_get url =
  match Curly.(run ~args:["-L"] (Request.make ~url:url ~meth:`GET ())) with
  | Ok x -> x.Curly.Response.body
  | Error e -> Format.printf "%a" Curly.Error.pp e; failwith "Failed to fetch the feed"
;;
val http_get : string -> string = <fun>

(* HTTP → HTTPS 301 redirect *)
# http_get "http://baturin.org/blog/atom-ocaml.xml" ;;
- : string =
"HTTP/2 200 \r\ndate: Sat, 29 Aug 2020 21:46:41 GMT\r\nserver: Apache/2.4.43 (Fedora) OpenSSL/1.1.1g\r\nlast-modified: Fri, 28 Aug 2020 19:23:29 GMT\r\netag: \"22aa-5adf4fcd7ab93\"\r\naccept-ranges: bytes\r\ncontent-length: 8874\r\ncontent-type: text/xml\r\n\r\n<?xml version='1.0' encoding='UTF-8'?>\n<feed xmlns=\"http:/"... (* string length 9115; truncated *)

(* No redirect *)
# http_get "https://baturin.org/blog/atom-ocaml.xml" ;;
- : string =
"<?xml version='1.0' encoding='UTF-8'?>\n<feed xmlns=\"http://www.w3.org/2005/Atom\" xml:lang=\"en\">\n  <id>https://baturin.org/blog/atom.xml</id>\n  <title>Daniil Baturin's blog</title>\n  <updated>2020-08-28T19:23:29.169634+00:00</updated>\n  <author>\n    <name>Daniil Baturin</name>\n    <email>daniil+webs"... (* string length 8874; truncated *)

Thanks for the report. Would you like to try fixing this?

Sure. I mostly wanted your confirmation that you consider this a bug to be fixed, and maybe some suggestions where to start looking. I'm happy to help.

This is indeed an issue. I've googled around and I don't see a decent way to have curl only output the final response.

It seems like you'll need to do some parsing hackery to get rid of the headers from the redirect response. I suggest that we:

  • Use -D <tmp_file> to save the response body. This will make our code less fragile.
  • Add a ?follow_redirects argument that will be translated to -L.
  • Change the header parser to only keep the last set of headers.

What do you think?

I'm also wondering if following redirects is the better default. @c-cube wdyt?

Done in #9