This repository contains an implementation of the Departures Time coding challenge, it is hosted on a digital ocean droplet and can be reached at the following urls https://uber-coding-challenge.cloud.itechcon.it/ and http://uber-coding-challenge.cloud.itechcon.it/.
The virtual machine is not available anymore
Create a service that gives real-time departure time for public transportation (use freely available public API). The app should geolocalize the user.
Here are some examples of freely available data:
511 (San Francisco)
Nextbus (San Francisco)
I chose the Departure Times project because it is an interesting challenge, after playing a bit with the Nextbus Api I saw that their dataset contains more than 100.000 stops.
I chose javascript (node.js) as language but I have no real experience with it, I work mainly with PHP for web applications and C# for windows desktop and client/server applications.
A GeoHashTable, using the GeoHash algorithm, has been implemented to extract nearby stops. A nice feature of this hashing method is the ability to groupp all the points inside a specific area by using a common prefix.
For the requested requirements a discrete precision is good enough. the GeoHashTable.getByDistance returns the points rounding the distance to a bounding box area as follows:
width = distance + (cell width / 2)
height = distance + (cell height / 2)
The cell width and cell height depend on the precision of the geohash, the current configuration uses a precision of 8 that translates to a cell width of 38.2m and a cell height of 19m. This means that give a distance of 100mt the points contined in a bounding box with the following size are returned:
width = 119.1mt
;height = 109.5mt
;
The code hasn't been developed following a test driven approach because I don't have specific knowledge of the mocha framework
2015/12/23, at about 20.30 GMT+1, the Nextbus's CDN, Incapsula, began to block the requests sending back some html and/or causing redirection loops. After digging deeper, I discovered they were sending two cookies (visid_incap_{NUMBER1} and incap_sess_{NUMBER2}_{NUMBER1}) to try to detect and block automated requests. I workarounded it doing a request to read the cookies but a few hours later they disabled the check and, right after, I removed the code. If the predictions webservice fails may depend on this
- Node.js >= 5.3.0
- Install node.js
npm install
- Copy config.js.skel to config.js
- Change the configuration file as needed, the most relevant section is
bindings
node generate-dataset.js
- Check if dev dependencies have been installed
- if not,
npm install --dev
npm test
- Check if dev dependencies have been installed
- if not,
npm install --dev
./node_modules/apidoc/bin/apidoc -o apidoc/ -i src/Controllers/api/
npm start
- node.js (very little experience)
- express (very little experience)
- request
- sax (I know the api but I never used this module)
- jshint (no experience)
- mocha (no experience)
- apidoc (no experience)
There would be a lot to say, but in brief:
- ES6 features, like classes, have been used
- A data storage backed by a local file has been used to simplify the deploy on heroku
- A sax parser has been used to reduce resources consumption
- I avoided comments, On purpose, in the methods body, the code must be self-explanatory
- The application is written with KISS priciple in mind
- the version number is contained in the resource uri, it makes the development and testing easier
- The parameters are passed as part of the resource uri or in the query string
- The GeoHashTable isn't a real hashtable implementation but relies on node.js (V8) array/hashtable implementation
Because of my basic knowledge there is a lot that can be improved:
- Write more tests! Almost all the code has been written with tests and mocking in mind, but few tests have been written
- Improve code documentation
- Improve error checking in Nextbus provider
- Implement logging
- Implement a (real) exception handler
- Handle non existent routes
- The data storage is provided by a local file, add support for MongoDB or Redis
- Add support for [Cluster](https://nodejs.org/api/cluster.html, because of the previous point it is not possible to use Cluster, every fork would load the entire dataset in memory
- Implement promises
- Improve GenerateDatasetApplication, right now it is a monolithic piece of code
- Implement DI and IoC patterns, ie. using Electrolyte
- Implement models to avoid direct access to the data storage
- Implement a class loader, I have implemented a custom loader for the Controllers but it is better to stick to a common one for all the classes
- Implement data caching
- Implement e-tag header support
- Stick to another sax parser, the current parser triggers the opentag event after reading the entire tag
- Implement the GeoHashTable as node.js native extension
- The controllers are tied to express, it is better to implement an abstraction layer to be able to change the framework used without rewriting all the controllers
- Stick to a better api documentator, apidoc is somewhat bugged because the webservice tester prints out escaped json
- A list of geo hashes groups, identified by a common prefix with a lowered precision, may be implemented to speed up the search
- Switch to babel (or similar), to use visibility modifiers, arrow functions, async generators, default export, and so on
- Implement grunt to support a pre-compilation step
- Query predictions by pages
- Dockerize the application!
Out of requirements improvements:
- add support for multiple providers
- group the stops by lat/lan