ultramiraculous / HTMLReader

A WHATWG-complaint HTML parser in Objective-C.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HTMLReader

A WHATWG-compliant HTML parser in Objective-C.

Why

I needed to scrape HTML like a browser. I couldn't find a good choice for iOS.

The Alternatives

libxml2 ships with iOS. It parses a variant of HTML 4 and does not handle broken markup like a browser.

Other Objective-C libraries I came across (e.g. hpple) use libxml2 and inherit its shortcomings.

WebKit ships with iOS, but its HTML parsing abilities are considered private API. I consider a round-trip through UIWebView inappropriate for parsing HTML. And I didn't make it very far into building my own copy of WebCore.

Testing

HTMLReader uses html5lib's tests for tokenization and tree construction. It adds some of its own tests too.

License

HTMLReader is in the public domain.

About

A WHATWG-complaint HTML parser in Objective-C.

License:Other


Languages

Language:Objective-C 99.9%Language:Ruby 0.1%