This is an exploration project on RAM based virtualized graph databases written in Go (golang). I doubt that it will ever reach production quality.
Assuming the go SDK is installed already, download
go get github.com/lercher/rdf
go get
go build
go test ./...
or
go test ./... -short
There are no particular runtime requirements, just copy the executable and run it with appropriate command line parameters. Go supports a lot of environments out of the box and so does this project.
First of all: don't do it. The generated artefacts of the antlr4 run are included in the repository, ready to be compiled by go. But in case you really need to do it:
- install a current JRE
- install and use Visual Studio Code
- install this antlr4 VSCode extension
Then use these VSCode settings as provided in settings.json in this repository to have the lexer, parser and listeners generated by the extension:
"antlr4.generation": {
"mode": "external",
"language": "Go",
"listeners": true,
"visitors": false,
"outputDir": "parser"
},
It consists of the following packages.
Query algebra resulting from a parsed SPARQL statement
Binary reading and writing of Graphs
Import of a csv reader with a header line and subject in the first column to a Graph
Basic rdf graph datatypes such as Graph, Triple, TriplePattern and Value
Engine to execute an algebra instance on a graph
Parsing SPARQL syntax to form an algebra instance
Primitive datatypes to be used in a graph's assertions
To execute a query against a data store, you'll need a Graph
containing the asserted Triples
, an Algebra
representing the SPARQL query and a processor
method.
Either create a new one and Assert
your knowledge
import "github.com/lercher/rdf/graph"
a := graph.IRI(`http://www.w3.org/1999/02/22-rdf-syntax-ns#type`)
g := graph.New()
g.Assert("martin", a, "person")
or load one from a csv file with a header line
import "github.com/lercher/rdf/encoding/csv"
const nsEst = `http://education.data.gov.uk/def/school/establishment/`
const nsSchool = `http://education.data.gov.uk/def/school/`
f, err := os.Open(`path_to.csv`)
defer f.Close()
dec := csv.NewDecoder(f, nsEst, nsSchool)
g, err := dec.Decode()
or import "github.com/lercher/rdf/encoding/binary"
and store and load a binary graph represention.
Allthough it's possible to construct an algebra from scratch, it's far more convenient to parse an algebra from a SPARQL query
import "github.com/antlr4-go/antlr/v4"
import "github.com/lercher/rdf/sparql"
input := antlr.NewInputStream(`select * {?s ?p ?o}`)
ast, err := sparql.Parse(input)
a := a.Algebra()
Don't forget to optimize your algebra after parsing. Currently, however there is no actual optimization.
a = a.Optimize()
Note: The ast
variable holds the Abstract Syntax Tree of the parsed query. It might
contain information more nearer to the parsed query than the algebra created from the AST.
There is currently only one processor, that processes the joins and filters of an algebra on a graph.
It is started by calling processor.Execute/3
on an Algebra
and a Graph
. The processor.Receiver
function in the 3rd parameter is called for any result line of the execution, here a func
literal:
import "github.com/lercher/rdf/sparql/processor"
// a algebra, g graph
err := processor.Execute(a, g, func(bs algebra.Binding, vs *algebra.Variables) (bool, error) {
m := bs.Materialize(g, vs)
log.Print(m)
return true, nil
})
The bs
var holds a compact representation of a result line's variables. It has to be materialized
with the help of the graph and the selected variables described in vs
. The resulting m
is just
a slice of Variablename (string
) and Value (interface{}
) pairs.
The bool
return value expresses the receiver function's wish to continue processing (true
)
or stop it (false
). An error
is passed up to the processor.Execute
call, ends it
immediatley and is the return value of it.
Note: Result lines are raised unordered and the sequence will be different on each call, because the underlying Go maps behave this way. On the other hand, the query processing only takes the memory it needs to produce the current result line.
Warning: Issuing ordered queries implies complete processing and materialization of all result lines before the first call of the receiver function. This can cost a lot of ressources, if the result set is large.