DataStax Enterprise Ruby Driver
NOTE: The DataStax Enterprise Ruby Driver can be used solely with DataStax Enterprise. Please consult the license.
This is the documentation for the DataStax Enterprise Ruby Driver for DSE. This driver is built on top of the DataStax Ruby driver for Apache Cassandra and enhanced for the adaptive data management and mixed workload capabilities provided by DSE. Therefore a lot of the underlying concepts are the same and to keep this documentation focused we will be linking to the relevant sections of the DataStax Ruby driver for Apache Cassandra documentation where necessary.
Within a script or irb, you can determine the exact versions of the dse and core drivers by accessing the VERSION constant of the appropriate module:
require 'dse'
puts "Dse Driver Version: #{Dse::VERSION}"
puts "Cassandra Driver Version: #{Cassandra::VERSION}"
This driver exposes the following features of DSE 5.0:
- Graph
- Authentication with nodes running DSE
- Geospatial types
Installation
The driver is named dse-driver on rubygems.org and can easily be installed with Bundler or the gem program. It will download the appropriate Cassandra driver as well.
Upgrade
The driver is intended to have the same look and feel as the core driver to make upgrading from the core driver
trivial. The only change is to replace references to the Cassandra
module with Dse
when
creating the cluster object:
require 'dse'
# This returns a Dse::Cluster instance
cluster = Dse.cluster
# This returns a Dse::Session instance
session = cluster.connect
rs = session.execute('select * from system.local')
Compatibility
Although this driver exposes new features introduced in DSE 5.0, it is fully compatible and supported for use with previous versions of DSE.
Graph
The DSE Graph service processes graph queries written in the Gremlin language.
Session#execute_graph
and Session#execute_graph_async
are responsible for transmitting graph queries to DSE graph.
The response is a graph result set, which may contain domain object representations of graph objects.
Any script using the DSE driver to execute graph queries will begin like this:
require 'dse'
# Connect to DSE and create a session whose graph queries will be tied to the graph
# named 'mygraph' by default. See the documentation for Dse::Graph::Options for all
# supported graph options.
cluster = Dse.cluster(graph_name: 'mygraph')
session = cluster.connect
The DSE driver is a wrapper around the core Cassandra driver, so any valid options to the core driver are valid in the DSE driver as well.
To execute system query statements (to create a graph for example), do not specify a graph name to bind to when connecting. This is illegal in DSE graph.
Vertices
Vertices in DSE Graph have properties. A property may have multiple values. This is represented as an array when manipulating a Vertex object. A property value may also have properties of their own (known as meta-properties). These meta-properties are simple key-value pairs of strings; they do not nest.
# Run a query to get all the vertices in our graph.
results = session.execute_graph('g.V()')
# Each result is a Dse::Graph::Vertex.
# Print out the label and a few of its properties.
puts "Number of vertex results: #{results.size}"
results.each do |v|
# Start with the label
puts "#{v.label}:"
# Vertex properties support multiple values as well as meta-properties
# (simple key-value attributes that apply to a given property's value).
#
# Emit the 'name' property's first value.
puts " name: #{v.properties['name'][0].value}"
# Name again, using our abbreviated syntax
puts " name: #{v['name'][0].value}"
# Print all the values of the 'name' property
values = v['name'].map do |vertex_prop|
vertex_prop.value
end
puts " all names: #{values.join(',')}"
# That's a little inconvenient. So use the 'values' shortcut:
puts " all names: #{v['name'].values.join(',')}"
# Let's get the 'title' meta-property of 'name's first value.
puts " title: #{v['name'][0].properties['title']}"
# This has a short-cut syntax as well:
puts " title: #{v['name'][0]['title']}"
end
Edges
Edges connect a pair of vertices in DSE Graph. They also have properties, but they are simple key-value pairs of strings.
results = session.execute_graph('g.E()')
puts "Number of edge results: #{results.size}"
# Each result is a Dse::Graph::Edge object.
results.each do |e|
# Start with the label
puts "#{e.label}:"
# Now the id's of the two vertices that this edge connects.
puts " in id: #{e.in_v}"
puts " out id: #{e.out_v}"
# Edge properties are simple key-value pairs; sort of like
# meta-properties on vertices.
puts " edge_prop1: #{e.properties['edge_prop1']}"
# This supports the short-cut syntax as well:
puts " edge_prop1: #{e['edge_prop1']}"
end
Path and Arbitrary Objects
Paths describe a path between two vertices. The graph response from DSE does not indicate that the response is a path, so the driver cannot automatically coerce such results into Path objects. The driver returns a DSE::Graph::Result object in such cases, and you can coerce the result.
results = session.execute_graph('g.V().in().path()')
puts "Number of path results: #{results.size}"
results.each do |r|
# The 'value' of the result is a hash representation of the JSON result.
puts "first label: #{r.value['labels'].first}"
# Since we know this is a Path result, coerce it and use the Path object's methods.
p = r.as_path
puts "first label: #{p.labels.first}"
end
When a query has a simple result, the :value attribute of the result object contains the simple value rather than a hash.
results = session.execute_graph('g.V().count()')
puts "Number of vertices: #{results.first.value}"
Duration Graph Type
DSE Graph supports several datatypes for properties. The Duration type represents a duration of time. When DSE Graph returns properties of this type, the string representation is non-trivial and requires parsing in order for the user to really gain any information from it.
The driver includes a helper class to parse such responses from DSE graph as well as to send such values in bound paramters in requests:
# Create a Duration property in the schema called 'runtime' and declare that 'process' vertices can have this property.
session.execute_graph(
"schema.propertyKey('runtime').Duration().ifNotExists().create();
schema.propertyKey('name').Text().ifNotExists().create();
schema.vertexLabel('process').properties('name', 'runtime').ifNotExists().create()")
# We want to record that a process ran for 1 hour, 2 minutes, 3.5 seconds.
runtime = Dse::Graph::Duration.new(0, 1, 2, 3.5)
session.execute_graph(
"graph.addVertex(label, 'process', 'name', 'calculator', 'runtime', my_runtime);",
arguments: {'my_runtime' => runtime})
# Now retrieve the vertex. Assume this is the only vertex in the graph for simplicity.
v = session.execute_graph('g.V()').first
runtime = Dse::Graph::Duration.parse(v['runtime'].first.value)
puts "#{runtime.hours} hours, #{runtime.minutes} minutes, #{runtime.seconds} seconds"
Miscellaneous Features
There are a number of other features in the api to make development easier.
# We can access particular items in the result-set via array dereference
p results[1]
# Run a query against a different graph, but don't mess with the cluster default.
results = session.execute_graph('g.V().count()', graph_name: 'my_other__graph')
# Create a Graph Options object that we can save off and use. The graph_options arg to execute_graph
# supports an Options object.
options = Dse::Graph::Options.new
options.graph_name = 'mygraph'
results = session.execute_graph('g.V().count()', graph_options: options)
# Set an "expert" option for which we don't have accessor methods.
# NOTE: Such options are not part of the public api and may change in a future release of DSE.
options.set('super-cool-option', true)
# Change the graph options on the cluster to alter subsequent query behavior.
# Switch to the analytics source in this case.
cluster.graph_options.graph_source = 'a'
results = session.execute_graph('g.V().count()')
# Create a statement object encapsulating a graph query, options, parameters,
# for ease of reuse.
statement = Dse::Graph::Statement.new('g.V().limit(n)', {n: 3}, graph_name: 'mygraph')
results = session.execute_graph(statement)
Authentication
DSE 5.0 introduces DSE Unified Authentication, which supports multiple authentication schemes concurrently. Thus, different clients may authenticate with any authentication provider that is supported under the "unified authentication" umbrella: internal authentication, LDAP, and Kerberos.
NOTE: the authentication providers described below are backward-compatible with legacy authentication mechanisms provided by older DSE releases. So, feel free to use these providers regardless of your DSE environment.
Internal and LDAP Authentication
Just as Cassandra::Auth::Providers::Password
handles internal and LDAP authentication with Cassandra, the Dse::Auth::Providers::Password
provider handles these types of
authentication in DSE 5.0 configured with DseAuthenticator. The Ruby DSE driver makes it very easy to authenticate with username and password:
cluster = Dse.cluster(username: 'user', password: 'pass')
The driver creates the provider under the hood and configures the cluster object appropriately.
Kerberos Authentication
Initial Setup
Unlike other authentication mechanisms, Kerberos requires some set-up on the client. First, set the KRB5_CONFIG
environment variable to the location of your krb5.conf
file and use kinit
to obtain a ticket from your
Kerberos server.
This environment variable is also needed by the Ruby DSE driver when run in an MRI Ruby interpreter. This is due to the fact that Kerberos support is implemented as a C extension that uses the gssapi system libraries -- the same libraries that command line tools like kinit use.
The JRuby implementation of Kerberos support uses the Java security framework, which requires
the java.security.krb5.conf
system property to be set to the location of the krb5.conf
file. One way to
accomplish this is to set the JRUBY_OPTS
environment variable before running your client application:
export JRUBY_OPTS="-J-Djava.security.krb5.conf=/home/user1/krb5.conf"
Configuring the Client
To enable kerberos authentication with DSE nodes, set the auth_provider
of the cluster to
a Dse::Auth::Providers::GssApi
instance. The following example code shows all the ways to set this up.
require 'dse'
# Create a provider for the 'dse' service and have it use the first ticket in the default ticket cache for
# authentication with nodes, which have hostname entries in the Kerberos server. All of the
# assignments below are equivalent:
provider = Dse::Auth::Providers::GssApi.new
provider = Dse::Auth::Providers::GssApi.new('dse')
provider = Dse::Auth::Providers::GssApi.new('dse', true)
provider = Dse::Auth::Providers::GssApi.new('dse', true, nil)
# Same as above, but this time turn off hostname resolution because the Kerberos server
# may be configured with ip's, not hostnames, of DSE nodes.
provider = Dse::Auth::Providers::GssApi.new('dse', false)
# Use a custom hostname resolver.
class MyResolver
def resolve(ip)
"host-#{ip}"
end
end
provider = Dse::Auth::Providers::GssApi.new('dse', MyResolver.new)
# Specify different principal to use for authentication. This principal must already have a valid
# ticket in the Kerberos ticket cache. Also, the principal name is case-sensitive, so make sure it
# *exactly* matches your Kerberos ticket.
provider = Dse::Auth::Providers::GssApi.new('dse', true, 'cassandra@DATASTAX.COM')
# However you configure the provider, pass it to Dse.cluster to have it be used for authentication.
cluster = Dse.cluster(auth_provider: provider)
Ticket Caches
By default, kinit
and related tools (e.g. klist
, kdestroy
) manipulate a simple file tied to the client os user's
numeric id on Linux: /tmp/krb5cc_<uid>
. This file only supports one "ticket granting ticket", so if you have a need for
multiple credentials in your system (e.g. multiple applications each of which need to authenticate with different
credentials to different services), you can supply the -c
argument to kinit to authenticate and store the resulting
ticket in a different cache. In that set-up, you must initialize your auth_provider
in the driver with this info:
# The fourth arg is the path to the cache file.
provider = Dse::Auth::Providers::GssApi.new('dse', true, nil, '/home/myuser/krb.cache')
For MRI (the underlying gssapi C library, actually), you can set the KRB5CCNAME
environment variable instead of
supplying an extra argument to the provider constructor.
Mac supports non-default caches as well, but it's not necessary because by default the default cache is an in-memory store that supports multiple tickets.
Geospatial Types
DataStax Enterprise v5.0 adds support for three geospatial types in the underlying Cassandra 3.x database. Instances of these types can be expressed in well-known text (WKT) form as well as a binary representation known as well-known binary (WKB). This latter representation is sent over the wire between the client and DSE node, but the former makes it easy to submit queries with geospatial type references in cqlsh.
For example, if you had a points
table with an int key f1
and a PointType column p
, you could insert a row into
it like this in cqlsh: INSERT INTO points (f1, p) VALUES (7, 'POINT (32.0 12.0)');
You can compose points into
line-strings and you can compose line-strings into polygons. See
this section of the WKT documentation for details.
Point
A Point is a point with x,y coordinates. Columns in DSE have the custom type org.apache.cassandra.db.marshal.PointType
.
# The geospatial types are defined in the Dse::Geometry module. Save some typing and include it
# here so that we can refer to the classes with their base names.
include Dse::Geometry
# Create a table with a PointType column and insert a row into it.
session.execute("CREATE TABLE IF NOT EXISTS points_of_interest" \
" (name text PRIMARY KEY, coords 'PointType')")
session.execute('INSERT INTO points_of_interest (name, coords) VALUES (?, ?)',
arguments: ['Empire State', Point.new(38.0, 21.0)])
# Now retrieve the point.
rs = session.execute('SELECT * FROM points_of_interest')
rs.each do |row|
# We can emit the point in its WKT representation.
puts "#{row['name']} #{row['coords'].wkt}"
# Or the x and y coordinates
puts "#{row['name']} #{row['coords'].x},#{row['coords'].y}"
# Which is really the to_s of the point, so you can do this:
puts "#{row['name']} #{row['coords']}"
end
LineString
A LineString is a set of lines, characterized by a sequence of Points. As Points live in the 2D xy-plane,
so do LineStrings. Each line shares a point with another line, thus forming a string of lines. A real-world
example of this is a path on a map. Columns in DSE have the custom type org.apache.cassandra.db.marshal.LineStringType
.
# The geospatial types are defined in the Dse::Geometry module. Save some typing and include it
# here so that we can refer to the classes with their base names.
include Dse::Geometry
# Create a table with a LineString column and insert a row into it.
session.execute("CREATE TABLE IF NOT EXISTS directions" \
" (origin text PRIMARY KEY, destination text, directions 'LineStringType')")
session.execute('INSERT INTO directions (origin, destination, directions) VALUES (?, ?, ?)',
arguments: ['office', 'home', LineString.new(Point.new(12.0, 21.0),
Point.new(13.0, 31.0),
Point.new(14.0, 41.0))])
# Now retrieve the line-string.
rs = session.execute('SELECT * FROM directions')
rs.each do |row|
directions = row['directions'].points.map do |point|
"#{point.x},#{point.y}"
end.join(" to ")
puts "Directions from #{row['origin']} to #{row['destination']}: #{directions}"
# Or more simply (thanks to an overridden to_s)
puts "Directions from #{row['origin']} to #{row['destination']}: #{row['directions']}"
# And its wkt for fun
puts "WKT: #{row['directions'].wkt}"
end
Polygon
A Polygon is an enclosed shape consisting of a set of linear-rings. A linear-ring is a LineString whose last point is the same as its first point (thus forming a ring when you connect the points). The first ring specified in a polygon defines the outer edges of the polygon and is called the exterior ring. A polygon may also have holes within it, specified by other linear-rings, and those holes may contain linear-rings indicating islands. All such rings are called interior rings.
# The geospatial types are defined in the Dse::Geometry module. Save some typing and include it
# here so that we can refer to the classes with their base names.
include Dse::Geometry
# Create a table with a Polygon column and insert a row into it. A polygon consists of a set
# of linear-rings. A linear-ring is a LineString whose last point is the same as its first point.
session.execute("CREATE TABLE IF NOT EXISTS places (name text PRIMARY KEY, layout 'PolygonType')")
exterior_ring = LineString.new(Point.new(0, 0),
Point.new(20, 0),
Point.new(26, 26),
Point.new(0, 26),
Point.new(0, 0))
interior_ring = LineString.new(Point.new(1, 1),
Point.new(1, 5),
Point.new(5, 5),
Point.new(5, 1),
Point.new(1, 1))
session.execute('INSERT INTO places (name, layout) VALUES (?, ?)',
arguments: ['Capitol', Polygon.new(exterior_ring, interior_ring)])
# Now retrieve the polygon
rs = session.execute('SELECT * FROM places')
rs.each do |row|
puts "Layout of #{row['name']}:"
# Write out the exterior ring
puts "Exterior: #{row['layout'].exterior_ring}"
# Write out the first point in the first interior ring...because we can.
puts "First interior point: #{row['layout'].interior_rings.first.points.first}"
# Finally, let's emit the WKT representation.
puts "WKT: #{row['layout'].wkt}"
end
License
Copyright (C) 2016 DataStax Inc.
The full license terms are available at http://www.datastax.com/terms/datastax-dse-driver-license-terms