NOTE: Do not clone this repo to your workspace. Fork it first, then clone your fork.
The goals of this assignment are as follows:
-
Give you more exposure to RSpec and to more advanced and idiomatic Ruby programming
-
Introduce you to important technologies and libraries used in service-oriented architecture, including parsing XML using XPath expressions and calling RESTful APIs of a remote service
-
Get you accustomed to reading both RSpec code and Ruby app code.
Generally useful documentation pointers, in addition to those specific to this assignment scattered throughout the handout:
Nokogiri docs: http://nokogiri.org/
Docs for Ruby libraries Net::HTTP, URI, CGI: http://ruby-doc.org/
The Oracle of Bacon is a fanciful website maintained by Patrick Reynolds. You enter the names of two actors (if you leave either one blank, it defaults to Kevin Bacon) and it computes the number of links or degrees of separation ("Bacon number") between the two actors, using information from the Internet Movie Database (IMDb). A "link" is a movie that two actors have worked on together. For example, Sir Laurence Olivier has a Bacon number of 2:
Laurence Olivier \_ Dracula (1979)
/
Frank Langella \_ Frost/Nixon (2008)
/
Kevin Bacon
You can read this as: "Laurence Olivier starred in Dracula with Frank Langella; Frank Langella starred in Frost/Nixon with Kevin Bacon."
The website also has a RESTful XML API that returns raw XML documents rather than complete Web pages. We will use this API in this assignment.
We've provided a code skeleton and a reasonable (but not exhaustive) set
of specs for this assignment. The assignment has 4 parts; each part has
its own specs in a describe
group. Initially, all specs are marked
:pending => true
so you don't get a rash of failing tests when you
start the assignment; when you start work on any one part of the
assignment, remove the pending option from the describe block, and run
that specific block using a command like rspec spec/oracle_of_bacon_spec.rb:10
,
which runs the set of tests encompassed in the block starting on that line number. Watch the
group of tests fail, and then start writing the code to make them pass.
(This is a crude version of the Test-Driven Development methodology we
use in other parts of the course; in the full version of TDD, you write
your own tests one at a time, watching each one fail and then creating
the code to make it pass.)
You can experiment with the XML API at this test page, but you will need to provide an API key. The key provided for using the API in this course is in the picture below, to avoid having it indexed by search engines:
Assign this value to the @api_key
attribute in initialize
before
starting the rest of the assignment.
Please respect the Oracle's maintainers by not using this key for any purpose other than doing this assignment.
A minimal RESTful query URI for OOB must include the API key (parameter
p
), the actor from which to start search (parameter a
), and
optionally the actor to connect to (optional parameter b
; defaults to
Kevin Bacon if omitted). If there is more than one way to
connect two actors together, the Oracle of Bacon returns one chosen
randomly, so the responses you get may differ from those in this handout.
Remember that special characters in URIs must be
escaped and that one such special character is a space, which may be
replaced by +
in a URI. Thus valid queries might be (if you
replace my_key
with the valid API key above):
http://oracleofbacon.org/cgi-bin/xml?p=my_key&a=Kevin+Bacon&b=Laurence+Olivier
which connects Laurence Olivier with Kevin Bacon, or
http://oracleofbacon.org/cgi-bin/xml?p=my_key&a=Carrie+Fisher+(I)&b=Ian+McKellen
which connects Carrie Fisher (I) with Ian McKellen.
- Visually inspect the XML returned for each of the above queries. You
can view it by typing the URIs into a browser, or better, by using a
command-line tool such as
curl
. What kinds of XML elements are present in the response? What is their hierarchical structure?
If there are multiple matches for an actor name, you'll get a list of similar names so you can resubmit your query with an exact match. For example, try doing a query connecting Anthony Perkins to anyone.
- Visually inspect the XML returned. How are the element types different from those for a normal response?
Finally, if you submit a request whose URI does not include a valid API key, you'll get a third type of response, informing you that the access was unauthorized.
- Visually inspect the XML returned. How does it differ from the previous two responses?
In the rest of this assignment you'll create a Ruby wrapper library to
make it easier to use the Oracle of Bacon. With our new library, we'd
be able to run the above three examples as follows (again replacing
my_api_key
with the valid key given previously).
oob = OracleOfBacon.new('my_api_key')
# connect Laurence Olivier to Kevin Bacon
oob.from = 'Laurence Olivier'
oob.find_connections
oob.response.type # => :graph
oob.response.data # => ['Kevin Bacon', 'The Big Picture (1989)', 'Eddie Albert (I)', 'Carrie (1952)', 'Laurence Olivier']
# connect Carrie Fisher (I) to Ian McKellen
oob.from = 'Carrie Fisher (I)'
oob.to = 'Ian McKellen'
oob.find_connections
oob.response.data # => ['Ian McKellen', 'Doogal (2006)', ...etc]
# with multiple matches
oob.to = 'Anthony Perkins'
oob.find_connections
oob.response.type # => :spellcheck
oob.response.data # => ['Anthony Perkins (I)', ...33 more variations of the name]
# with bad key
oob = OracleOfBacon.new('known_bad_key')
oob.find_connections
oob.response.type # => :error
oob.response.data # => 'Unauthorized access'
Before even attempting to find connections, an OracleOfBacon object must
have at least one of from
or to
specified (if one is missing, it's
assumed to be Kevin Bacon), and a nonblank API key (although we won't be
able to tell if it's valid until we try hitting the service).
We use the ActiveModel::Validations
module that is part of Rails to
streamline our validity checks. It is a module that mixes in
validity-checking methods for object attributes and gives our object
an instance method valid?
that checks all the constraints and returns
true or false.
We've included validity checks for the
presence of non-blank From, To, and APIKey attributes in an
OracleOfBacon instance.
-
In the root directory of this homework (the one containing
lib
andspec
subdirectories), runrspec
. It will look for tests inspec
and for your code inlib/oracle_of_bacon.rb
. -
Delete
,:pending => true
from thedescribe 'instance'
block, and runrspec spec/oracle_of_bacon_spec.rb:10
. Visually inspect the failing specs to get a feel for how they are written. Don't worry if you don't understand everything that's going on; the goal is to get you accustomed to reading code and seeing common idioms. -
Define the constructor for OracleOfBacon so that both From and To initially default to Kevin Bacon, rather than being blank. This should cause a subset of the validity specs to pass green.
-
Since it makes no sense to connect an actor to herself, we should also validate that the From and To fields aren't the same. When the "common case" validations like
validates_presence_of
aren't enough to do this, we use,validate
, which takes a symbol naming a method that can perform a desired validation (line 20 oforacle_of_bacon.rb
). Fill in the method body forfrom_does_not_equal_to
to check that the user is not trying to connect an actor to herself. The convention used in theValidations
module is that if a validation check fails, the error information should be recorded in theerrors
object. Check the documentation forActiveModel::Validation
andActiveModel::Errors
at api.rubyonrails.org to learn how to record your own error message "From cannot be the same as To", which the spec (test) expects to be generated in this case.
When you've done the above steps correctly, all the specs in
describe 'instance'
should pass green.
We define a separate class, OracleOfBacon::Response
, to hold a
response from the service. This class exposes the type
and data
attributes to the caller, as the examples above showed. "Nesting" one
class definition inside another is often done when
the nested class (Response
) is rarely used separately from the
enclosing class (OracleOfBacon
).
In our case, successful response to OOB queries return XML markup, which we will parse using the Nokogiri library. As we saw above, there are three response types (graph, spellcheck, error), but we'll use the same techniques on all three.
Although Nokogiri is hugely powerful, there are just two Nokogiri calls you need to know to parse this simple example.
-
The constructor
Nokogiri::XML
takes a string (or, as is idiomatic Ruby, an open file descriptor or stream descriptor), parses its contents as XML, and returns aNokogiri::XML::Document
representing the parsed tree. -
The instance method
#xpath
on a Nokogiri XML document or node returns a collection of all nodes in that subtree matching the given XPath selector. Just as CSS selectors identify particular elements in an HTML document, XPath is an amazingly powerful syntax for identifying collections of elements in an XML document. Some mastery of XPath is a valuable tool in any SaaS developer's toolbox, but we will restrict ourselves to two very simple XPath expressions:/foo Matches an element ... at the root of this subtree //foo Matches an element ... anywhere in this subtree
Hence, the XPath expression /error
applied to an error response
matches the outermost <error>
element (which,
remember, includes all of its child elements); /link
applied to a successful
response matches the enclosing <link>
element;
and //actor
applied to a successful response returns a collection
(quacks like an Array
) of all the <actor>
elements at or below
the document's root.
(If you want to experiment interactively with XPath to learn more about it, the XPathTester site lets you paste a blob of XML and try various XPath expressions on it to see which elements are returned.)
Point #1 above -- a constructor that makes a new object (XML document)
from an existing object of a different type (string) -- is a very common
Ruby idiom. We follow it by requiring the constructor for
OracleOfBacon::Response
to accept a blob of XML (returned by the OOB
server) and turn it into an internal Response object. The conversion
involves (a) determining what type of response it is (regular graph,
spell check, error) and (b) parsing the XML data depending on the
response type.
We've started you off with a constructor that creates the parsed XML
document and with a parse_response
method that handles the error case.
You need to handle the other two. Read the specs under describe 'parsing XML response'
and match them up with the requirements below:
-
For a normal graph, the
data
attribute of theResponse
object should be an array that alternates actor names and movie names, as the code block example above showed, and thetype
value should be:graph
. -
For a spell check, the
data
should be a simple array of all the possible spelling variants andtype
should be:spellcheck
. -
For readability, we suggest you define
parse_graph_response
andparse_spellcheck_response
methods and call them as needed from the constructor. -
You should also handle a response that doesn't match any of the three types, by giving it a response type of
:unknown
and a data field consisting of the stringunknown response type
.
When you complete the above four steps, all the specs in describe 'parsing XML response'
should pass green.
Helpful hints for parsing XML and converting node text into arrays:
-
The
text
method on aNokogiri::XML::Node
returns the actual text content of that node. That is, ifnode == <actor>Carrie Fisher</actor>
, thennode.text == "Carrie Fisher"
. -
zip
interleaves the element of its receiver with those of its argument, usingnil
to pad if the first array is longer than the second; that is,[:a,:b,:c].zip([1,2])==[[:a,1],[:b,2],[:c,nil]]
-
flatten
takes an array that includes arbitrarily nested arrays and flattens them into a single array with no nested arrays, that is,[[:a,1],[:b,2],[:c,nil]].flatten==[:a,1,:b,2,:c,nil]
-
compact
removes nil elements from a collection, that is,[:a,1,:c,nil].compact==[:a,1,:c]
Questions for self-reflection:
-
Our Response object manipulates an internal variable
@doc
. Why didn't we expose it withattr_reader :doc
? -
What does the keyword
private
do (right after the constructor) and why did we use it here?
(Hint: Both questions concern matters of style and modularity, not correctness. That is, the code would work either way.)
We know what the URI has to look like, but the URI rules (warning: they
are boring to read) stipulate that
special characters in URIs such as # or spaces must be 'escaped'.
Happily, Ruby's standard library provides CGI.escape(string)
to help
us do this.
- Fill in the method
make_uri_from_arguments
that uses the@from
,@to
, and@api_key
attributes of anOracleOfBacon
instance and assigns a properly-escaped URI to the@uri
attribute.
The specs verify that the URI contains no illegal characters and that it contains all the components in the arguments.
At the end of this part, the specs under describe 'constructing URI'
should pass green.
And now the moment we've all been waiting for: connecting to the actual Oracle Of Bacon service!!!
Except we're not going to do that. Given that you're re-running tests
every time you change the code, it would be inconsiderate (and for some
sites, in violation of the terms of use of the API key) to hit the site
every time you do so. Instead, we will use FakeWeb, which we'll learn
more about later in the course, to fake a "canned" response by
intercepting calls to Ruby's Net::HTTP
library.
(Of course, this is only true when running tests. Your real code will really talk to OOB.)
Since we've already tested that the various types of XML responses are
handled properly by the constructor of OracleOfBacon::Response
, all we
are testing here is the service connection itself, in
find_connections
.
If successful, it should try to hand off the received blob of XML to the
constructor for Response
. If there's a network problem--
unfortunately, Ruby's HTTP library can raise many kinds of exceptions--
we convert any of them into a generic OracleOfBacon::NetworkError
and
re-raise that.
- Modify
find_connections
to handle both the successful response case and the network exception case. This should cause the specs underdescribe 'service connection'
to pass.
Question for reflection:
- As we'll learn later in the course, the .xml files under
spec
that contain the "canned" responses are sometimes called fixtures. How would you create these yourself when you're developing tests for your own RESTful service library? (Hint: the command-line utilitycurl
may be useful.)
Believe it or not, you're done. You can try firing up an irb
interpreter and exercising the library by saying load './lib/oracle_of_bacon.rb'
and trying the examples in the code block in the
Background section of this handout!
For fun, you can add a draw_graph
method that does the following: if
the result type is :graph
, it draws something like this using ASCII
characters ---
Actor 1 \_ Movie 1
/
Actor 2 \_ Movie 2
/
Actor 3
---and if the result type is anything else (including nil
when there
is no result yet), it prints an error.
##Extra challenges
-
Try getting setup with an automated test framework such as guard or autotest. Guard or AutoTest can be set up so that they will run all the tests in
spec/
, but every time you edit and save your code file, the tests are automatically re-run, so you don't have to run them manually. As we'll see later, this is the "watch the test fail" part of the TDD or test-driven process of development: write the tests before you write the code, watch the test fail, fill in the code and save the code file, then watch the test pass! -
Try pairing using the one-undermanship pair programming style