A WHATWG-compliant HTML parser in Objective-C.
I needed to scrape HTML like a browser. I couldn't find a good choice for iOS.
libxml2 ships with iOS. It parses a variant of HTML 4 and does not handle broken markup like a browser.
Other Objective-C libraries I came across (e.g. hpple) use libxml2 and inherit its shortcomings.
WebKit ships with iOS, but its HTML parsing abilities are considered private API. I consider a round-trip through UIWebView inappropriate for parsing HTML. And I didn't make it very far into building my own copy of WebCore.
HTMLReader uses html5lib's tests for tokenization and tree construction. It adds some of its own tests too.
HTMLReader is in the public domain.