oguzp1 / cse232b-xml-project

Project for the UCSD graduate level course CSE 232B.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

cse232b-xml-project

Project for the UCSD graduate level course CSE 232B.

Run Instructions

java -jar cse232b-xml-project.jar path/to/input.txt

The program will output the result to stdout.

Optimizations

If the program detects a join that it can optimize, it will write the optimized query in a file and then will evaluate the optimized query. If the input query is at a/b/c.txt, the optimized query will be in a/b/c_optimized.txt.

Joins and Cartesian Products

Joins in the grammar are of the form:

join ( xq , xq , [ name (, name)* ] , [ name (, name)* ] )

Cartesian products in the grammar are of the form:

join ( xq , xq , [ ] , [ ] )

These are parsed using the same rule. If there exists two or more transitive closures of variables without a join condition in between in an optimizable query, the program will introduce cartesian products in the rewritten query.

Join Order Heuristic

Roughly, the program will join certain sub-queries before others based on the following (in the order of importance):

  • the sub-query has 1+ join condition mentioning one of its variables, i.e. no cartesian product needed.
  • the sub-query has 1+ where condition(s) that concern(s) its variables.
  • the sub-query has a low number of join conditions mentioning one of its variables, i.e. it directly joins with fewer sub-queries than other sub-queries.

Grammar

The main grammar can be found in src/main/antlr/XGrammar.g4.

src/main/antlr/XSimpleGrammar.g4 is used in order to filter implicit joins we can optimize, but is otherwwise not used by the program for any parsing.

About

Project for the UCSD graduate level course CSE 232B.

License:MIT License


Languages

Language:Java 98.1%Language:ANTLR 1.9%