This library uses Clojure-wrapped Apache Jena to read OWL ontologies into a Datahike database.
The library is still young, though its basic features have been tested.
There are three steps to getting started:
- configuring the database,
- specifying the data to store, and
- creating the database.
There are several persistent DB options described in the Datahike database configuration docs.
The example shown here writes a persistent file-based DB to /tmp/datahike-owl-db
.
(def db-cfg {:store {:backend :file :path "/tmp/datahike-owl-db"
:keep-history? false
:schema-flexibility :write})
There are also examples in the test directory.
Data sources are defined as map entries in nested maps keyed by a prefix (string) that is used as the namespace of a keywords naming each resource associated with the sources URI. For example, the following defines two sources; the first is DOLCE-Lite, retrieved from a remote location; the second is a local ontology in turtle syntax.
(def onto-sources
{"dol" {:uri "http://www.ontologydesignpatterns.org/ont/dlp/DOLCE-Lite.owl"},
"mod" {:uri "http://modelmeth.nist.gov/modeling", :access "data/modeling.ttl", :format :turtle}})
The key of the outer map defines a shortname for the resource; it provides a namespace for unique identifiers used in the DB in lieu of the IRI string. The value of the outer map is a map providing the details about the source file to be read. The map keys are defined below. In the example, the resource http://www.ontologydesignpatterns.org/ont/dlp/DOLCE-Lite.owl#state will be stored as an entity with :resource/iri = :dol/state.
The following keywords are used in the sources:
:uri
the URI of the OWL file to be read,:access
a pathname to a local copy of the OWL file (for use where the data is local).:format
the format of the content. This defaults to :rdfxml. Presumably, any format for which JENA is capable will be read, but the only two tested are :rdfxml and :turtle.:ref-only?
this is used suppress reading content but nonethess establish a relationship between a namespace prefix and a URI. There is a default set of these that can be overidden with values from the sources you provide (such asonto-sources
above):
{"daml" {:uri "http://www.daml.org/2001/03/daml+oil" :ref-only? true},
"dc" {:uri "http://purl.org/dc/elements/1.1/" :ref-only? true},
"owl" {:uri "http://www.w3.org/2002/07/owl" :ref-only? true},
"rdf" {:uri "http://www.w3.org/1999/02/22-rdf-syntax-ns" :ref-only? true},
"rdfs" {:uri "http://www.w3.org/2000/01/rdf-schema" :ref-only? true},
"xml" {:uri "http://www.w3.org/XML/1998/namespace" :ref-only? true},
"xsd" {:uri "http://www.w3.org/2001/XMLSchema" :ref-only? true}}
Examples of the use of sources can be found in the test directory.
With the database configured and the source defined as described above, you then call (owl/create-db! db-cfg onto-sources)
to create the database. The function returns a connection objct to the database; it also sets the dynamic variable *conn*
in the core namespace to this connection object. A new connection can be acquired at any time by calling the fucntion again without the :rebuild? argument. It can also be acquired directly from Datahike by calling (datahike.api/connect <db-config-map-as-described-above>)
, which returns an atom containing the connection object.
(require '[owl-db-tools.core :as owl])
(owl/create-db! db-cfg onto-sources
:rebuild? true
:check-sites ["http://ontologydesignpatterns.org/wiki/Main_Page"])
The function create-db! takes the following optional keyword arguments:
-
:rebuild?
iftrue
reads the sources, otherwise presumably the database exists and a connection to it is returned. -
:check-sites
is a collection of sites (their URIs) that are sources for ontologies. You can use this to abort reading when a source site is not available. This can be used only whenrebuild?
is true. -
:check-sites-timeout
is the number of milliseconds to wait for a response from a check-site. (Defaults to 15000.) Of course, this argument is relevant only whenrebuild?
is true. -
:user-attrs
a vector of Datahike attribute properties (see the section on 'Database Schema' below) to override the default attributes, or attributes learned while reading data.
Additional actions on the database are described in the Datahike readme and Datahike API docs.
The database can be queried directly using Datahike's query
and pull
APIs, or using Pathom3 and the
Pathom3 resolvers automatically generated for the attributes of the OWL DB read.
A typical the Datahike query is depicted below paired with filter to get all the RDF resources defined in the DOLCE namespace of the example database.
(require '[datahike.api :as d])
(def conn (d/connect db-cfg))
(->> (d/q '[:find [?v ...] :where [_ :resource/iri ?v]] @conn)
(filter #(= "dol" (namespace %))) sort)
; Returns
(:dol/abstract :dol/abstract-location :dol/abstract-location-of :dol/abstract-quality :dol/abstract-region :dol/accomplishment :dol/achievement...)
Pathom is a powerful query language similar to GraphQL. With Pathom, you specify the shape of the data you wish to acquire and let its planner do the work of composing a query that provides the data. The use of Pathom positions owl-db-tools to be used as a remote client. Pathom's documentation is quite good, so only a simple example is provided here.
(in-ns 'owl-db-tools.resolvers)
(def owl-db (register-resolvers! *conn*)) ; Create the basic attribute resolvers for your database.
(owl-db [{[:resource/iri :info/mapped-to] [:rdf/type :rdfs/domain :rdfs/subPropertyOf]}])
;;; Returns the following:
{[:resource/iri :info/mapped-to]
{:rdf/type :owl/ObjectProperty,
:rdfs/domain [:dol/particular],
:rdfs/subPropertyOf :dol/mediated-relation}}
There is also a Pathom resolver for obtaining the names of all the RDF resource in the database. This resolver provides an optional Pathom parameter that allows filtering, for example, to retrieve all the names in a given namespace:
(owl-db '[(:ontology/context {:filter-by {:attr :resource/namespace :val "dol"}})])
The :filter-by
parameter may be a vector of such {:attr ... :val ...}
maps, but keep in mind that many
resource attributes are references (to accommodate expressions). Thus, the following won't work:
(owl-db '[(:owl/db {:filter-by [{:attr :resource/namespace :val "dol"} ; Won't work.
{:attr :rdf/type :val :owl/Class}]})]) ; :rdf/type is a DB reference, not a value such as :owl/Class.
For such activities, it is better to use the Datahike interfaces.
pull-resource
is a convenience function in the resolvers namespace that wraps a Pathom3 resolver.
It returns a map of all the triples associated with an RDF resource.
It takes two required arguments the resource keyword and the connection object.
You can specify :keep-db-ids? true
in the call if you would like the result to include database IDs of the returned structure's elements.
(res/pull-resource :dol/perdurant *conn*)
; Returns
{:resource/iri :dol/perdurant,
:resource/name "perdurant",
:resource/namespace "dol",
:owl/disjointWith [:dol/endurant :dol/abstract :dol/quality],
:rdf/type :owl/Class,
:rdfs/comment
["Perdurants (AKA occurrences) comprise what are variously called events, processes, phenomena..."],
:rdfs/subClassOf
[:dol/spatio-temporal-particular
{:owl/onProperty :dol/has-quality, :owl/allValuesFrom [:dol/temporal-quality], :rdf/type :owl/Restriction}
{:owl/onProperty :dol/has-quality, :owl/someValuesFrom [:dol/temporal-location_q], :rdf/type :owl/Restriction}
{:owl/onProperty :dol/part, :owl/allValuesFrom [:dol/perdurant], :rdf/type :owl/Restriction}
{:owl/onProperty :dol/participant, :owl/someValuesFrom [:dol/endurant], :rdf/type :owl/Restriction}
{:owl/onProperty :dol/specific-constant-constituent, :owl/allValuesFrom [:dol/perdurant], :rdf/type :owl/Restriction}]}
resource-ids
(in the core namespace) takes one argument, the database connection and returns a vector of resource IDs (namespaced keyword).
sources
(in the core namespace) takes one required argument, the database connection and returns a map of information about sources read.
An optional boolean keyword argument :l2s
(meaning 'long to short') can be specified to return a simple map of
resource URI strings (the map keys) to short-names used as the namespaces of keyword resource IDs.
schema-attributes
(in the core namespace) takes one required argument, the database connection and returns a map of database attribute specs.
An optional keyword argument :origin
can be provided with one or more elements from the set #{:all, :learned, :user}
to filter the result to user-specified, learned, or all attributes. The default is #{:learned :user}
.
The initial database schema is as shown below.
The data you load may contain attributes beyond those shown.
If while reading the data, additional attributes are encountered, the data will be studied and a best guess at the
cardinality and type of the data will be made.
An attribute spec will be defined accordingly.
You can alway query to see what attribute specs were defined.
You can use :user-attrs
on the call to create-db!
to override the guessing process on an individual attribute basis.
Details about such schema can be found in the Datahike schema docs.
(def app-schema
[#:db{:ident :resource/iri :cardinality :db.cardinality/one :valueType :db.type/keyword :unique :db.unique/identity}
#:db{:ident :resource/name :cardinality :db.cardinality/one :valueType :db.type/string}
#:db{:ident :resource/namespace :cardinality :db.cardinality/one :valueType :db.type/string}
#:db{:ident :source/short-name :cardinality :db.cardinality/one :valueType :db.type/string :unique :db.unique/identity}
#:db{:ident :source/long-name :cardinality :db.cardinality/one :valueType :db.type/string :unique :db.unique/identity}
#:db{:ident :source/loaded? :cardinality :db.cardinality/one :valueType :db.type/boolean}
#:db{:ident :box/boolean-val :cardinality :db.cardinality/one :valueType :db.type/ref} ; These for useful when
#:db{:ident :box/keyword-val :cardinality :db.cardinality/one :valueType :db.type/ref} ; for example, boxing is necessary,
#:db{:ident :box/number-val :cardinality :db.cardinality/one :valueType :db.type/ref} ; such as when you need to store a
#:db{:ident :box/string-val :cardinality :db.cardinality/one :valueType :db.type/ref} ; ref, but have one of these db.type.
#:db{:ident :app/origin :cardinality :db.cardinality/one :valueType :db.type/keyword}])
(def owl-schema
;; multi-valued properties
[#:db{:ident :owl/allValuesFrom :cardinality :db.cardinality/many :valueType :db.type/ref}
#:db{:ident :owl/disjointUnionOf :cardinality :db.cardinality/many :valueType :db.type/ref}
#:db{:ident :owl/disjointWith :cardinality :db.cardinality/many :valueType :db.type/ref}
#:db{:ident :owl/equivalentClasses :cardinality :db.cardinality/many :valueType :db.type/ref}
#:db{:ident :owl/equivalentProperty :cardinality :db.cardinality/many :valueType :db.type/ref}
#:db{:ident :owl/hasKey :cardinality :db.cardinality/many :valueType :db.type/ref}
#:db{:ident :owl/intersectionOf :cardinality :db.cardinality/many :valueType :db.type/ref}
#:db{:ident :owl/members :cardinality :db.cardinality/many :valueType :db.type/ref}
#:db{:ident :owl/onProperties :cardinality :db.cardinality/many :valueType :db.type/ref}
#:db{:ident :owl/oneOf :cardinality :db.cardinality/many :valueType :db.type/ref}
#:db{:ident :owl/propertyChainAxiom :cardinality :db.cardinality/many :valueType :db.type/ref}
#:db{:ident :owl/sameAs :cardinality :db.cardinality/many :valueType :db.type/ref}
#:db{:ident :owl/someValuesFrom :cardinality :db.cardinality/many :valueType :db.type/ref}
#:db{:ident :owl/unionOf :cardinality :db.cardinality/many :valueType :db.type/ref}
#:db{:ident :owl/withRestrictions :cardinality :db.cardinality/many :valueType :db.type/ref}
;; single-valued properties
#:db{:ident :owl/backwardCompatibleWith :cardinality :db.cardinality/one :valueType :db.type/string}
#:db{:ident :owl/cardinality :cardinality :db.cardinality/one :valueType :db.type/number}
#:db{:ident :owl/complementOf :cardinality :db.cardinality/one :valueType :db.type/ref}
#:db{:ident :owl/equivalentClass :cardinality :db.cardinality/one :valueType :db.type/ref}
#:db{:ident :owl/hasValue :cardinality :db.cardinality/one :valueType :db.type/boolean}
#:db{:ident :owl/imports :cardinality :db.cardinality/one :valueType :db.type/ref}
#:db{:ident :owl/inverseOf :cardinality :db.cardinality/one :valueType :db.type/ref}
#:db{:ident :owl/minCardinality :cardinality :db.cardinality/one :valueType :db.type/number}
#:db{:ident :owl/onProperty :cardinality :db.cardinality/one :valueType :db.type/ref}
#:db{:ident :owl/versionInfo :cardinality :db.cardinality/one :valueType :db.type/string}])
(def rdfs-schema
[#:db{:ident :rdfs/domain :cardinality :db.cardinality/many :valueType :db.type/ref}
#:db{:ident :rdfs/range :cardinality :db.cardinality/many :valueType :db.type/ref}
#:db{:ident :rdfs/comment :cardinality :db.cardinality/many :valueType :db.type/string}
#:db{:ident :rdfs/label :cardinality :db.cardinality/many :valueType :db.type/string}
#:db{:ident :rdfs/subClassOf :cardinality :db.cardinality/many :valueType :db.type/ref}
#:db{:ident :rdfs/label :cardinality :db.cardinality/one :valueType :db.type/string}
#:db{:ident :rdfs/subPropertyOf :cardinality :db.cardinality/one :valueType :db.type/ref}])
(def rdf-schema
[#:db{:ident :rdf/type :cardinality :db.cardinality/one :valueType :db.type/ref} ; boxed because not always a keyword.
#:db{:ident :rdf/parseType :cardinality :db.cardinality/one :valueType :db.type/keyword}])
Some of the tests use the DOLCE Lite Plus (DLP) ontology, found at http://www.ontologydesignpatterns.org. A copy of that data can be found in data/DLP directory of this project.
- Ontologies imported by
owl:imports
are not automatically loaded. If you want them, you must reference them in the call tocreate-db!
. This might be for the best, since it is a chance to define meaningful short names.
This software was developed by NIST. This disclaimer applies.