This parser was built with two main goals:
-
binary parsing
-
Namespace specific parsing
It uses the parser_pt
parse transform (aka don't write these) for writing the actual parser.
-
src/parser_pt.erl
Will generate simple parsers from a specification embedded in the erlang code. -
src/exemell.erl
The main XML Parser, the parser is strickly whitespace preserving. -
src/exemellp.erl
Xml output, no pretty printing, and behaviour for xml serializable things. DefinesModule:xml(exemellp:state(),Module()) -> iolist()
, and providesxml(tag(),[{tag(),iolist()}],[child()],exemellp:state()) -> iolist()
as the preferred way of implementing the callback. -
src/exemell_parser.erl
behaviour for parser callback modules (and default implementation) -
src/exemell_namespace.erl
behaviour for namespace callback and implementation of thenone
namespace. -
src/exemell_namespace_xml.erl
implementation of thexml:
namespace. -
src/exemell_block.erl
andsrc/exemell_blob.erl
callbacks (and default implementations) for the block creation. -
include/xml.hrl
Defines the record generated by the parser when reverting to pseudo DOM mode. -
include/exemell.hrl
Exposes the internal state of full parser, useful if you wish to set your own entity handling, meta handling or p.i. handling (or even to rewire the namespace handling). -
src/parser.hrl
The actual XML parser used byexemell.erl
. The parser is not for the faint of heart but should make it obvious how to useparser_pt.erl
.
It's not the fastest parser out there, but its reasonably fast a very rough test places the parser at about:
- ~10% slower then erlsom
- twice as fast as xmerl As with any performance rating, take them with a lot of salt, and do your own measurements.
- No validation other then basic wellformedness is done (the parser will silently ignore missing close tags)
- The parser is hard coded to UTF8 encoding
- Other than sanity checking there's no testing done.
- Not all features are implemented yet (particularly skip and blob are not implemented).
- There is an unavoidable warning about an undefined behaviour
exemell
(erlc is not favourable to self referentiality).
- Proper test cases, the modules are type annotated and dialyzer seems happy enough.
- Figure out how to make dialyzer like a polymorphic version of the module.
A simple parsing benchmark is included, though its results are likely meaningless for practical purposes. Enter the path(s)s to your favourite example XML in to test/exemell_bennchmar.erl
and then make benchmark
.
To include erlsom
in the benchmark uncomment the dependency in rebar.config
and make dependencies; make all
.