fehrenbach / qcert

Certified Query Compiler

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Q*cert

CircleCI

http://github.com/querycert/qcert

About

This is the source code for the Q*cert verified query compiler. The goal of the project is to develop a state of the art query compiler for languages over a rich data model (nested, hierarchical, etc), with an implementation which provides strong correctness guarantees.

Q*cert is built and verified using the Coq proof assistant. A working compiler can be obtained by 'extracting' OCaml code from the source in Coq.

Building Q*cert

Prerequisites

To build Q*cert from the source, you will need:

An easy way to get set up on most platforms is to use the OCaml package manager (https://opam.ocaml.org). Once opam is installed, you can just add the corresponding libraries:

opam install menhir
opam install ocamlbuild
opam install camlp5
opam install coq

One platform that isn't directly supported by the OCaml package manager is Windows. We do not currently have detailed instructions for how to build on Windows.

Compile Q*cert

  1. Compile the Coq source:

    make qcert (Note: this will take a while, you can run make faster with 'make -j 8 qcert')

  2. Extract the compiler and build the OCaml frontend:

    make extraction

This should produce the ./bin/qcert executable.

Compile Queries

Once the compiler is built, it can be used to compile queries. The ./samples directory contains a few examples written in OQL (Object Query Language) syntax. For instance:

$ cat samples/oql/test1.oql 
select p
from p in Persons
where p.age = 32

Calling the compiler on that sample with OQL as source language and Javascript as target language can be done as follows:

$ ./bin/qcert -source OQL -target JS samples/oql/test1.oql

This will tell you the compilation steps being used:

Compiling from oql to js:
  oql -> nraenv -> nraenv -> nnrc -> nnrc -> js

and produce a javascript file called samples/oql/test1.js.

Similarly for Java:

$ ./bin/qcert -source OQL -target Java samples/oql/test1.oql

This will produce a java file called samples/oql/test1.java.

Run compiled queries

Q*cert targets a number of languages and data processors as backends (currently: Javascript, Java, Cloudant and Spark). The way you run the compiled queries varies depending on the target. Usually you need two things: (i) a run-time library that implements some of the core operators assumed by the compiler (e.g., ways to access records or manipulate collections), and (ii) a query runner which allows to execute the query on some input data.

Runtime libraries are in the ./runtime directory. We include simple query runners in the ./samples directory in order to try the examples.

Prerequisites

To compile the Java runtime library or the provided query runner, you will need a Java compiler (Java 7 or later).

Build the Q*cert runtimes

To compile the supporting runtime for the Java target:

make java-runtime

Build the sample query runners

To compile the small query runners:

cd samples
make

Now, you're good to go! Next step: run your compiled queries.

Run queries compiled to Javascript

(In the ./samples directory)

To run a query compiled to Javascript, you can call java for the RunJavascript query runner (It uses uses the Nashorn Javascript engine for the JVM). You will need to pass it two pieces of information: (i) the location of the Q*cert runtime for javascript, and (ii) some input data on which to run the query. From the command line, you can do it as follows:

java -cp bin testing.runners.RunJavascript \
     -input data/persons.json \
	 -runtime ../runtime/javascript/qcert-runtime.js
	 oql/test1.js

The input data in data/persons.json contains a collection of persons and a collection of companies in JSON format and can be used for all the tests. If you run test1, it should return all persons whose age is 32:

[{"pid":1,"name":"John Doe","age":32,"company":101},
 {"pid":2,"name":"Jane Doe","age":32,"company":103},
 {"pid":4,"name":"Jill Does","age":32,"company":102}]

Alternatively the provided ./samples/Makefile can compile and run a given test for you:

make run_js_test1

Run queries compiled to Java

(In the ./samples directory)

To run a query compiled to Java, you must first compile it by calling javac for the produced Java code, then call java with the RunJava query runner. You will need to pass it three pieces of information: (i) the location of the gson jar which is used to parse the input, (ii) the location of the Q*cert runtime for java, both as part of the classpath, and (ii) some input data on which to run the query. From the command line, you can do it as follows, first to compile the Java code:

javac -cp bin:../runtime/java/bin:../lib/gson-2.7.jar oql/test1.java

Then to run the compiled Class:

java -cp bin:../runtime/java/bin:../lib/gson-2.7.jar:oql testing.runners.RunJava \
     -input data/persons.json \
	 test1

Alternatively the provided ./samples/Makefile can compile and run a given test for you:

make run_java_test1

Caveats

  • There is no official support for Windows, although some success has been reported (See Issue #1)
  • The Spark 2 target is in development, and not yet operational
  • The documentation is based on an early version of the compiler and is outdated
  • Support for the source miniOQL language is preliminary

Code Organization

The code is in three main directories:

  • ./coq contains the Coq source code
  • ./ocaml contains the toplevel compiler and code extraction from Coq
  • ./runtime contains libraries necessary to run queries compiled through Q*cert for various platforms (Java, Javascript, and Spark 2.0).

Inside the ./coq directory, the organization is as follows.

  • Foundational modules:
  • Intermediate languages (ILs), including eval, typing, type inference, and equivalences/rewrites:
    • ./CAMP contains support for the Calculus of Aggregating Matching Patterns
    • ./NRA contains support for the Nested Relational Algebra
    • ./NRAEnv contains support for the extension of NRA with environments
    • ./NNRC contains support for the Named Nested Relational Calculus
    • ./NNRCMR contains support for the Named Nested Relational Calculus with Map-Reduce
    • ./DNNRC contains support for the Distributed Named Nested Relational Calculus
  • Translations:
    • ./Translation contains translations between ILs
    • ./Backend contains backend support and code generation
    • ./Frontend contains surface language support (except for jRules)
  • Toplevel:
    • ./Compiler contains the overall compiler instructure and functional optimizers
    • ./Tests contains various coq self-tests
    • (./Updates is early code for updates that isn't part of the actual compiler)

License

Q*cert is distributed under the terms of the Apache 2.0 License, see ./LICENSE.txt

Contributions

Q*cert is still at an early phase of development and we welcome contributions. Contributors are expected to submit a 'Developer's Certificate of Origin' which can be found in ./DCO1.1.txt.

Background

Current source languages include:

  • mini-OQL [SS16], a usable query language for objects, which is a subset of OQL [ODMG30].

Current targets include:

  • Javascript [JS06], Java [Java7], Cloudant (CouchDB), and Spark [Zah12].

The compiler relies a number of intermediate languages for optimization and code-generation, notably:

A calculus for pattern-matching with aggregation:

  • We use the Calculus of Aggregating Matching Patterns (CAMP) from [SSH15] to capture the pattern matching and data processing semantics of rules languages, such as jRules [BM11]

A nested-relational algebra:

  • We use the nested relational algebra (NRA) from [Clu93,Moe09], which is designed for optimization purposes. It includes the relational algebra as a subset and has been applied successfully to a variety of query languages, notably OQL and XQuery.

A nested-relational calculus:

  • We use the Named Nested Relational Calculus (NNRC) from [Bus07], which makes the encoding of NRA operators natural, and is closely related to the 'XQuery core' used in [FSW01], and comprehensions.

The compiler includes translations from source to target and an optimizer which leverages the database algebra and calculus. Those translations, as well as the optimization rules are proved correct using the Coq proof assistant.

References

[BM11] Jérôme Boyer, and Hafedh Mili. Agile business rule development. Springer Berlin Heidelberg, 2011.

[Bus07] Jan van den Bussche, and Stijn Vansummeren. "Polymorphic type inference for the named nested relational calculus." ACM Transactions on Computational Logic (TOCL) 9.1 (2007): 3.

[Clu93] Sophie Cluet, and Guido Moerkotte. "Nested Queries in Object Bases". Proceedings of the Fourth International Workshop on Database Programming Languages (DBPL'93). pp 226-242.

[Doo95] Robert B. Doorenbos. Production matching for large learning systems. Dissertation, University of Southern California, 1995.

[FSW01] Mary Fernandez, Jérôme Siméon, and Philip Wadler. "A Semi-monad for Semi-structured Data." International Conference on Database Theory (ICDT'2001). Springer Berlin Heidelberg, 2001. 263-300.

[Java7] The Java® Language Specification. James Gosling, Bill Joy, Guy Steele, Gilad Bracha, Alex Buckley. February 2013. Oracle Corp.

[JS06] David Flanagan. JavaScript: the definitive guide. " O'Reilly Media, Inc.", 2006.

[Moe09] Guido Moerkotte. Building Query Compilers. Draft Manuscript. http://pi3.informatik.uni-mannheim.de/~moer/querycompiler.pdf

[ODMG30] Jeff Eastman, et al. The object data standard: ODMG 3.0. Eds. Roderic Geoffrey Galton Cattell, and Douglas K. Barry. Vol. 1. San Francisco: Morgan Kaufmann, 2000. http://www.odbms.org/odmg-standard/odmg-book/

[SSH15] Avraham Shinnar, Jérôme Siméon, and Martin Hirzel. "A Pattern Calculus for Rule Languages: Expressiveness, Compilation, and Mechanization." 29th European Conference on Object-Oriented Programming. 2015. http://hirzels.com/martin/papers/ecoop15-rules-nra.pdf

[SS16] Avraham Shinnar and Jérôme Siméon. "Nominal Typing for Data Languages". Technical Report, IBM. July 2016

[Zah12] Matei Zaharia, et al. "Fast and interactive analytics over Hadoop data with Spark." USENIX; login 37.4 (2012): 45-51. https://www.usenix.org/publications/login/august-2012-volume-37-number-4/fast-and-interactive-analytics-over-hadoop-data

About

Certified Query Compiler

License:Apache License 2.0


Languages

Language:Coq 87.5%Language:OCaml 6.8%Language:Java 3.3%Language:Scala 0.7%Language:Python 0.6%Language:Makefile 0.4%Language:JavaScript 0.3%Language:Io 0.1%Language:Emacs Lisp 0.1%Language:HTML 0.1%Language:Standard ML 0.0%Language:Shell 0.0%