buddhamagnet/scraped

As the exercise was completed in under 2 hours improvements could of course be made:

The scraper is returned in a factory function.
The document is now a field on the scraper struct.
The document is created using an io.Reader value for easier testing.
This means the tests can now pull raw HTML from tests fixtures for parsing. There is an open PR that pulls the fixture data from strings also (using strings.NewReader).
Some concurrency has been added for element processing via a WaitGroup.

This scraper has a single dependency, goquery, which simplifies DOM traversal. Otheriwse it's all standard lib (apart from mux for the web part).
Dependency management is handled using the Go 1.5+ vendor flag and git submodules.
The current tests hit real HTML endpoints (in gists) which are slower than mocks but goquery NewDocument takes a URI to operate so used them for this pass. Extensions to this code would probably swap to some form of mocking.
The exercise required a console application. This solution delivers both a console and a web version as I had time!
Time taken to build this solution: 1 hour 45 minutes.

In the console folder run go build.
Run ./console and bathe in the JSON goodness or ./scraped > scraped.json.
To hit another page with fruity goodness, run ./console -URI=https://gist.githubusercontent.com/buddhamagnet/c6997464d84b8bf379a1/raw/0b2e8ca65fed073197cd7be1e91a163738488f2e/fruity.html.

buddhamagnet / scraped