mirage / ocaml-uri

RFC3986 URI parsing library for OCaml

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

generic URI syntax from RFC 3986 section 3

hannesm opened this issue · comments

First of all thanks for developing and publishing this library! :)

I'm about to implement RFC 5545 which requires uri MUST follow the generic URI syntax defined in RFC3986, is this library a good fit?

From RFC 3986 section 3, the generic URI syntax seems to be: URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] -- i.e. not including URI references and relative references. Together with section 2, allowed characters and percent-encoding, this means that not all possible strings are valid URIs.

Now, reading through this library, I could find of_string and make to construct Uri.t -- the former does accept all the strings and does a best effort parsing of the pieces -- the latter constructs an uri from provided pieces. What I'm mainly looking for is a strict parser -- i.e. when I receive an URI from the network as a string, I'd like to have the string accepted or rejected (i.e. parse : string -> (Uri.t, [> Msg of string]) result`) - which should fail if the string contains e.g. a whitespace or other non-allowed characters, or does not specify a scheme, or is a relative path. Is there a way to achieve this functionality with this library which I missed? is such functionality desired by others? If this is the case, are there any plans to extend this library with such functionality?

I do think the uri library overall is in need of an overhaul using a modern parser combinator like Angstrom. Having said that, you should be able to check for your parsing requirements by using the existing functions to convert to a Uri.t and then checking that the various components are present (e.g. a scheme or if its a relative path) via the accessor functions.