rdlopes / WebHere

HTML scraping for Objective-C.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WebHere

CI Status Version License Platform

WebHere is an Objective-C framework for web scraping, packaged for iOS 8+ and OSX 10.10+ platforms.

Briefly put, web scraping is parsing of a website and extraction of data from the HTML pages contained in it.

This work has been inspired by RestKit, but aimed at HTML data and working in a simpler form (no mapping upfront, model classes declare their own building strategy); it is mostly relying on:

Those two projects really deserve attention on their own, make sure to visit their page and understand their APIs, as WebHere will mostly provide a unified facade to their APIs.

Features

  • Downloads HTML pages and extracts data into user-defined classes.
  • Allows the user to use XPath to query the HTML document.
  • Pre-defined methods to extract links and forms.
  • Tested.

Limitations

  • At this moment only GET and POST REST methods have been tested.
  • Please pay attention to the legal issues when peforming web scraping.

Usage

To run the example project, clone the repo, and run pod install from the Example directory first.

Having a look at the test cases provided should give you an overview of the API.

Example folder contains an iOS app that maps HTML to query Google.

Requirements

Dependencies are automatically managed by Cocoapod. In case you have to add WebHere to your source tree and use it outside Cocoapod, you must add the following projects along with WebHere:

Installation

WebHere is available through CocoaPods. To install it, simply add the following line to your Podfile:

pod "WebHere"

Alternatively, you can add these sources as a git submodule.

Author

Rui Lopes, rui.d.lopes@me.com

License

WebHere is available under the MIT license. See the LICENSE file for more info.

Testing

Project has been covered by unit tests using:

Please notice that all tests are performed locally, meaning that no actual network access is needed, all requests being stubed by Nocilla.

Contributing

  1. Fork it
  2. Create your feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create new Pull Request

About

HTML scraping for Objective-C.

License:MIT License


Languages

Language:Objective-C 58.6%Language:HTML 40.8%Language:Ruby 0.5%Language:C 0.1%