Using the code: The directory all ready contains the file and partitions used by our experiments. To compile, simply type: $> make This will generate 3 executables: dblite - interactive SM database application genqueries - generates synthetic query workload benchmark - simulates a multi-user environment on workload queries ==================================================== Running dblite ==================================================== To execute dblite type: $> dblite This will take you into interactive mode. Here you can create the database files, initialize the database with various configurations, and query the data Creating the Database (at the dblite prompt): dblite> create createdb To load an existing database (at dblite prompt): dblite> load config db.xml There are three configurations available: Filename Description ======== ============ config n-ary configP SM configPAX PAX To Execute a Query: dblite> query [query] query syntax ===================================================== projection(child; projection-list) merge-join(left-child, right-child; join-clause; ) \\ where left child and right child are sscans sscan(table[:alias]; projection-list; [where-clause]) *projection-list ::= comma separated qualified attribute list. *example(s): dblite> query projection(merge-join(sscan(T1:R; R.a, R.b, R.g, R.c; R.a < 5), sscan(T2:S; S.a, S.b; ); S.b = R.a; ); R.a, R.b, S.b, R.g, R.c) dblite> query projection(merge-join(sscan(T1:R; R.a, R.b, R.g, R.c; R.g = 'Z'), sscan(T2:S; S.a, S.b; S.b > 350 & S.b <= 1232); S.b = R.a; ); R.a, R.b, S.b, R.g, R.c) dblite> query projection(merge-join(sscan(T1:R; R.a, R.b, R.g, R.c; R.g = 'Z' & R.a > 200), sscan(T2:S; S.a, S.b; S.b > 850 & S.b <= 1232); S.b = R.a; ); R.a, R.c) In the above examples, the names for the tables and the attributes are shown in the schema and can be found in "describe". The names are determined in the db.xml file. Type help for a full list of commands: command options description ======= ======= =========== help display usage/help query <query> execute query, results are returned to stdout profile [count] <query> profiles the query load <partition-config> <schema> loads the database create <synthetic-info> creates a populated synthetic database tables list the tables in the database describe <table-name> list the table schema for the selected table layout ?|f|p gets/sets the current materialization ? - get current layout f - singl partition p - 2-partitons quit exits the program but why would you? Known Issues ===================================================== The parser is very brittle if incorrect syntax encountered it segfaults Running with Cachegrind ===================================================== In order to run with the simulated cache, call the following, valgrind --tool=cachegrind dblite After loading the data and running a query, quit out of dblite, and the results will be shown. You can also simply load the data and not run a query before quitting to see to total overhead. ==================================================== Running genqueries ==================================================== To execute genqueries type: $> genqueries num-non-optimized nQ0 [nQ0 [nQ1 [nQ2 [nQ3 [nQ4 [nQ5 [nQ6 [nQ7 [nQ8 [nQ9 [nQ10 [nQ11 [nQ12 [nQ13 [nQ14]]]]]]]]]]]]]]] parameters description num-non-optimized number of non cache optimized queries to generate nQX number of queries to generate for query template X ==================================================== Running benchmark ==================================================== To execute benchmark type: $> benchmark config-file is-enabled nthreads < query-workload parameters description config-file same as dblite config file (specifies partition initialization) is-enabled enable or disable partition propagation (0 or 1) nthreads number of threads query-workload synthetic workload generated by genqueries ==================================================== Creating the data and the partitions ===================================================== In order to create an auto generated file, make sure there is a folder called "Data" within the DBCacheBenchmarking page. Open a file in the DBCacheBenchmarking folder called "createdb". Then, for each relation you want generated, put the following <filename> <#fields>|<#records>|<#bytes per record| Then for each field put one of the following types: int|incr|<byte offset> int|randIncr|<byte offset> int|range|<lower bound>|<upper bound>|<lower missing bound>|<upper missing bound>|<byte offset> int|oddRange|<lower bound>|<upper bound>|<byte offset> string|<length>|<byte offset> fK|<lower bound>|<upper bounde>|<lower missing bound>|<upper missing bound>|<byte offset> int|incr means that it is an integer type that increments for each record created int|randIncr creates an integer field that increments itself by a small random value for each record int|range creates a random integer that is in the range of <lower bound> to <lower missing bound> and <upper missing bound> to <upper bound> int|oddRange creates a random odd integer in the range of <lower bound> to <upper bound> int|evenRange creates a random odd integer in the range of <lower bound> to <upper bound> string creates a string with <length> characters fK| gives unique random values in the given range in sorted order. <byte offset> is the location of the field in the tuple An example file is: testTable1.tab 3|200|34 int|incr|0 string|26|4 int|range|0|55|0|0|30 testTable2.tab 4|1000|13 int|randIncr|0 int|evenRange|0|50|4 string|1|8 fK|55|100|70|76|9 This creates two binary data files "testTable1.tab" and "testTable2.tab" located in the "Data" folder. The first table has 200 records, 3 fields, and a record is 34 bytes long. The first field is an incremented integer, the second a string of 26 characters, and the third a randome integer from 0 to 55. The second table has 1000 records, 4 fields, and 13 bytes per record. The first field is a randomly incremented integer, the second a randome even integer from 0 to 50, the third and character, the fourth a unique sorted random integer from 55 to 70 and 76 to 100. Another file must also be modified, called db.xml. db.xml for the tables above looks like the following: <database> <tables> <table id = "0" name = "T1" path="Data/"testTable1.tab"> <schema> <attribute id="0" name="a" type="INTEGER" length="4" /> <attribute id="1" name="b" type="STRING" length="26" /> <attribute id="2" name="c" type="INTEGER" length="4" /> </schema> </table> <table id = "1" name = "T2" path="Data/testTable2.tab"> <schema> <attribute id="0" name="a" type="INTEGER" length="4" /> <attribute id="1" name="b" type="INTEGER" length="4" /> <attribute id="2" name="c" type="CHAR" length="1" /> <attribute id="3" name="d" type="INTEGER" length="4" /> </schema> </table> </tables> </database> Now, when running dblite, call create to create the new tables. The program will end after the tables are created and will need to be restarted, but the new tables will be in place. Creating Partitions: In order to partition the tables, a config file must be created (though the name can be anything). The first line must be 4096 (this is the page size, it should be variable, but 4096 is hard coded in other places, so it needs to be that value. For each relation, do the following: <fileLocation> <#Partitions>|<#Fields>|<#records>|<#bytesPerRecord> for each partition, do the following: <#fieldsInPartition>|<#sizeOfPartitionTuple> for each field in the partition, do the following: <#fieldNum>|<fieldSize> So, one config file for the above relations could be: 4096 Data/testTable1.tab 3|3|200|34 1|4 0|4 1|26 1|26 1|4 2|4 Data/testTable2.tab 2|4|1000|13 2|8 0|4 3|4 2|5 1|4 2|1 This splits the first relation in three partitions, each containing one field. The first partition has size 4, the second has size 26, and the third has size 4 again. The second relation is in 2 partitions, the first of which has the two fields, field 0 and field 3, and has a total tuple size of 8. The second contains fields 1 and 2. It should be noted that within a partition, the fields should be ordered from lowest to highest field id. Another might be: 4096 Data/testTable1.tab 1|3|200|34 3|34 0|4 1|26 2|4 Data/testTable2.tab 2|4|1000|13 3|12 0|4 1|4 3|4 1|1 2|1 This puts the first relation in one partition (so in nsm format). The second relation has two partitions, with 3 fields in on and 1 field in the other.