hlopezmx / eventScrapper

NodeJS Scrapper of events from a sample website

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Project: Wegottickets event scraper
Author: Hugo Lopez-Tovar


The scraper has been implemented in NodeJS, justified by:
	-NodeJS is fast, as the V8 engine compiles javascript directly into native machine code
	-It's great with concurrent connections and asynchronous calls
	-The Node Package Manager includes more than 250k modules
	-Javascript is a cool dynamic language!
	-This is the first time I try it, seemed like a great oportunity


INCLUDED LIBRARIES (from NPM: node package manager)
    -Express: Minimalistic framework that takes care of the http calls and routes table
    -X-Ray: for web scraping
    -Mocha, Chai and Supertest: For testing

	
IMPLEMENTED PATTERNS
    -MVC Architectural Pattern. Although the scraper performs a simple operation, this architecture provides a good structure for the project, specially if it is extended to include more complex functionalities. So, it benefits its scalability as well.
    -Module Pattern, splitting the code by logical use, helping to keep a structured set of code.
    -Singleton Pattern (for the scraper object), so the scraper is created only once, improving resources usage.
    -Constructor Pattern (for the Event model), using prototype to add methods only once to the model, improving resources usage.
	-Scraper configuration file, to easily reconfigure if the source webiste changes its format or URL
	
	 
IMPLEMENTED TESTS
	Only a few tests have been implemented given the short amount of time. Here the tests output:

	C:\eventScraper>mocha tests

	  API
		√ GET on /api/events should return an array of Event objects
		√ The events should include an artist

	  EventController
		√ Event Controller should return an array of Event objects

	  Scraper
		√ Scraper should return an array of Event objects from the source site
		√ The events should include an artist


	  5 passing (56ms)

		
HOW COULD THIS BE IMPROVED
	-The city is not being identified at the moment.
	-The date, time and price elements are being treated as text, but they could be stored as date, time and numbers as appropiate, so they can easily be used for sorting and filtering.
	-Only the first page of results is being scraped. This can be extended to further pages.
	-Add delay to the scraper, to make sure it doesn't abuse of the source website calls.
	-Add cache to not read the same events over and over.
	-Extend the events information by using other third party APIs, for example Google Places to validate the venue and obtain further information about it, or Wikipedia for the artist info.
	-Add more tests

	
HOW TO INSTALL AND RUN THE PROJECT
	1. Install nodejs and npm if these are not already installed: https://nodejs.org/en/download/
	2. Unzip the eventScraper.zip file in your local machine
	3. Open a console or command prompt and navigate to the root folder where the source has been unziped
	4. Use npm to automatically download and install the required libraries, running this command:
			npm install
	5. Execute gulp to start a local web server, running this command:
			gulp
	6. Using your webrowser, go to http://localhost:8888
	
	
SAMPLE OUTPUT

	The following is an example of the output generated by the scraper:
	[
		{
			"artist": "\"THE QUIET AMERICAN\" WITH SPECIAL GUEST APPEARANCE BY \"DANIELLE ATE THE SANDWICH\"",
			"venue": "NEWCASTLE UPON TYNE: The Cumberland Arms, Byker",
			"date_time": "Wed 25th May, 2016, 8:00pm",
			"price": "£5.00 + £0.00 Booking fee = £5.00"
		},
		{
			"artist": "\"THE QUIET AMERICAN\" WITH SPECIAL GUEST APPEARANCE BY \"DANIELLE ATE THE SANDWICH\"",
			"venue": "NEWCASTLE UPON TYNE: The Cumberland Arms, Byker",
			"date_time": "Wed 25th May, 2016, 8:00pm",
			"price": "£5.00 + £0.00 Booking fee = £5.00"
		},
		{
			"artist": "99 CLUB LEICESTER SQUARE COMEDY - WED 25TH MAY",
			"venue": "London 99 Club @ Storm Nightclub, 28A Leicester St, London, WC2H 7LE",
			"date_time": "Wed 25th May, 2016 Doors: 7:30pm  Starts: 8:30pm  Ends: 10:30pm"
		},
		{
			"artist": "ALY BAIN & PHIL CUNNINGHAM",
			"venue": "FINDHORN: Universal Hall",
			"date_time": "Wed 25th May, 2016, 7:00pm",
			"price": "£16.00 + £1.60 Booking fee = £17.60"
		},
		{
			"artist": "ALY BAIN & PHIL CUNNINGHAM",
			"venue": "FINDHORN: Universal Hall",
			"date_time": "Wed 25th May, 2016, 7:00pm",
			"price": "£14.00 + £1.40 Booking fee = £15.40"
		},
		{
			"artist": "ALY BAIN & PHIL CUNNINGHAM",
			"venue": "FINDHORN: Universal Hall",
			"date_time": "Wed 25th May, 2016, 7:00pm",
			"price": "£12.00 + £1.20 Booking fee = £13.20"
		},
		{
			"artist": "BARLUATH",
			"venue": "BIRMINGHAM: Red Lion Folk Club",
			"date_time": "Wed 25th May, 2016, 7:15pm",
			"price": "£12.00 + £1.20 Booking fee = £13.20"
		},
		{
			"artist": "BEAK>",
			"venue": "BRIGHTON : Bleach",
			"date_time": "Wed 25th May, 2016, 7:30pm",
			"price": "£12.00 + £1.20 Booking fee = £13.20"
		},
		{
			"artist": "BERNARD & EDITH + BLACKBIRD BLACKBIRD",
			"venue": "LEEDS: Nation of Shopkeepers",
			"date_time": "Wed 25th May, 2016, 8:00pm",
			"price": "£7.00 + £0.70 Booking fee = £7.70"
		},
		{
			"artist": "BERNARD & EDITH + BLACKBIRD BLACKBIRD",
			"venue": "LEEDS: Nation of Shopkeepers",
			"date_time": "Wed 25th May, 2016, 8:00pm",
			"price": "£4.00 + £0.40 Booking fee = £4.40"
		},
		{
			"artist": "BIG WEDNESDAY COMEDY CLUB - SMITHFIELD, CITY OF LONDON",
			"venue": "LONDON: Charterhouse - Smithfields",
			"date_time": "Wed 25th May, 2016, 7:00pm",
			"price": "£7.00 + £0.70 Booking fee = £7.70"
		}
	]

About

NodeJS Scrapper of events from a sample website

License:MIT License


Languages

Language:JavaScript 100.0%