Given two Wikipedia pages, WikiRacer will attempt to find the shortest path between the two using only links to other Wikipedia pages.
WikiRacer uses live data (via the MediaWiki API) and is extremely fast by using a bi-directional, depth-first search algorithm.
$ time ./wikiracer Pleiades OpenBSD
Pleiades
Bible
Système universitaire de documentation
Bill Joy
OpenBSD
Elapsed time: 3.120570172s
./wikiracer Pleiades OpenBSD 0.49s user 0.04s system 7% cpu 7.132 total
$ time ./wikiracer "Jim Beam" "King George"
Jim Beam
Kentucky
Letters patent
George IV of the United Kingdom
King George
Elapsed time: 2.672853825s
./wikiracer "Jim Beam" "King George" 0.32s user 0.03s system 7% cpu 4.684 total
The WikiRacer HTTP service depends on gorrilla/mux. Fetch and build with:
go get github.com/gorilla/mux && go get github.com/86me/wikiracer
cd $GOPATH/src/github.com/86me/wikiracer && go build && ./wikiracer
Recursively run the tests with:
go test ./...
usage: ./wikiracer [-debug] "from_title" "to_title"
-debug
Output logs to stderr
-help
Additional help information
-serve
Run HTTP server
Examples:
$ ./wikiracer "Ada Lovelace" "Robert Frost"
Ada Lovelace
Artificial intelligence
Dartmouth College
Robert Frost
Elapsed time: 1.517820434s
$ ./wikiracer -serve 0.0.0.0:4040
[WikiRacer] service running at 0.0.0.0:4040
- wikirace adheres to the WikiMedia etiquette guide as faithfully as possible. To that end, it runs, at most, two simultaneous API requests to Wikipedia at a time.
- 1 hour - Researching possible strategies for building
wikiracer
including tools, libraries, algorithms, and Wikipedia's MediaWiki API. For algorithms, the bi-directional, depth-first method immediately stood out to me as the way to go, with normal breadth-first/depth-first complexity being [O(b^d)], executing two searches would reduce complexity to [O(b^(d/2))] for each search, bringing total complexity to [O(b^(d/2) + b^(d/2)], which is far less than the vanilla BFS/DFS method.
- 1 hour - Familiarizing myself with the MediaWiki API. I started with patrickmn's go-wikimedia implementation, which was far too basic for anything other than making initial test queries to see how the API responded to basic requests. I studied a few different implementations of the MediaWiki API in other languages, but decided to use go because I really enjoy writing go code, I want to get more familiar with go internals, mutexes and concurrency patterns. I also really love the testing functionality of go and find it extremely powerful.
- 2 hours - Started fleshing out the base functionality. Came across Tyson Mote's implementation that used the BDFS algoritm and used it as a frame of reference. I especially appreciated the RWMutex concurrency pattern. Implemented a batch request model to keep overall API requests to a minimum. This model adheres to Wikipedia's published API limits. Added regular expressions matching to the boring links to better filter out unwanted paths.
- 1 hour - Implemented a REST API using gorrilla/mux. I could
have just used the native go net/http package, but gorilla offered some helpful
abstractions (mux/router). The REST API is accessible by running
wikiracer
with the-serve
switch. It binds to port 8686 by default. The port and address can be passed in as a parameter to the-serve
switch. eg:wikiracer -serve 0.0.0.0:4040
- 2 hours - Testing, writing tests, writing README, testing, more testing.