ZhengtongYan / Hands-on-Session-2-of-DW-BI-Course-2022-Spring

This repository contains the instructions of the second hands-on session in our DW&BI course.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hands-on Session 2: Transaction Processing and Multi-Model OLAP Analysis with AgensGraph

This repository contains the instructions for the second hands-on session in our DW&BI course.

Recording Video

Learning Objectives

Gain hands-on experience in transaction processing and multi-model OLAP analysis using AgensGraph.

  • Learn how to perform transaction processing in AgensGraph.
  • Learn how to write multi-model OLAP queries in AgensGraph.

Software Requirements

You need AgensGraph database to complete the exercises.

Some students who use Windows and macOS systems said that they have some problems when directly installing and using AgensGraph. So, if you use Windows or macOS systems, I recommend you install AgensGraph using Docker. Please refer to the following pdf document and Video to install AgensGraph with Docker:

You can also install AgensGraph without using Docker (This installation may have some problems). Windows and Linux users can directly download the installers and then install them based on the installation guide.

  • Windows system: Download the installer and then install it based on the installation guide.

  • Linux system: Download the installer and then install it based on the installation guide.

  • Installation Guide: Download the zip file from this link, then unizp this file and you will get two pdf files:

    • agens_graph_linux_installation_guide_html.pdf
    • agens_graph_windows_installation_guide_html.pdf

MacOS users need to build from the source code to install AgensGraph.

  • MacOS system: Please refer to this video to build and install AgensGraph in MacOS. This installation approach is complicated, so I recommend macOS users to install AgensGraph with Docker.

Download the Dataset

In part 2 of this hands-on session, you need to use three schema of the Unibench benchmark, including M3D (Multi-Model MultiDimensional) schema, FR (Full-Relational) schema, and NR (Non-Relational) schema. Note that part 1 (transaction processing) does not need to use this dataset.

Download the dmp file: Please download the dataset from this link. Click "here" in the "Loading the data" section (see the following figure) to download the compressed dump file (m3d.dmp).

image

For more details about these three schemas of Unibench, please read Paper 2 in the references section.

References

Exercises (20 points)

Part1: Transaction Processing (5 points)

Initialize demo database

CREATE TABLE "accounts" (
  "id" bigserial PRIMARY KEY,
  "owner" varchar NOT NULL,
  "balance" bigint NOT NULL,
  "currency" varchar NOT NULL,
  "createdat" timestamptz NOT NULL DEFAULT (now())
);
INSERT INTO accounts (owner, balance, currency)
VALUES
  ('AA', 100, 'EUR'),
  ('BB', 100, 'EUR'),
  ('CC', 100, 'EUR');

1. Read Phenomena and Isolation Levels

Consider a situation where the balance of AA's account (id=1) is initially 100 and two users simultaneously execute commands within transactions in AgensGraph:

Transaction 1 (user 1) Transaction 2 (user 2)
BEGIN;
BEGIN;
UPDATE accounts SET balance=50 WHERE id=1;
SELECT balance FROM accounts WHERE id=1;
UPDATE accounts SET balance=70 WHERE id=1;
COMMIT;
SELECT balance FROM accounts WHERE id=1;
COMMIT;

(1) At the READ COMMITTED level, what results the user 1 can get from the two queries? What read phenomenon occur? (1 point)

(2) At the REPEATABLE READ level, what results the user 1 can get from the two queries? What read phenomenon occur? (1 point)

(3) At the SERIALIZABLE level, what results the user 1 can get from the two queries? What read phenomenon occur? (1 point)

(4) At the READ UNCOMMITTED level, what results the user 1 can get from the two queries if the transactions are executed in MySQL instead of AgensGraph? What read phenomenon occur? (Hint: In AgensGraph, READ UNCOMMITTED is treated as READ COMMITTED. But in MySQL, READ UNCOMMITTED is different from READ COMMITTED.) (1 point)

2. Locks and Deadlock

(1)What kinds of locks do the following two transactions try to generate in AgensGraph? Are the two locks conflicting or not? Will the records in the accounts table be deleted by the truncate command? (0.5 point)

Transaction 1 Transaction 2
BEGIN;
BEGIN;
TRUNCATE accounts;
SELECT * FROM accounts;
ROLLBACK;
COMMIT;

(2)Will a deadlock occur between the following two concurrent transactions? Why? (0.5 point)

Transaction 1 (user 1) Transaction 2 (user 2)
BEGIN;
BEGIN;
UPDATE accounts SET balance=150 WHERE id=1;
UPDATE accounts SET balance=150 WHERE id=2;
UPDATE accounts SET balance=180 WHERE id=1;
UPDATE accounts SET balance=180 WHERE id=2;

Part2: Multi-Model OLAP Analysis (15 points)

📘 If you use Docker to install the AgensGraph, please refer to the following two materials about how to importing the dataset:

If you do not use Docker, you can refer the following steps to import the downloaded file "m3d.dmp" into AgensGraph using pg_restore command. Refer to this link about pg_restore command.

Step 1: In AgensGraph, create a database called "unibench_m3d" and a graph called "unibench_graph", and set the graph_path. The commands are as follows:

CREATE DATABASE unibench_m3d;
CREATE GRAPH unibench_graph;
SET graph_path = unibench_graph;

Step 2: Open a new terminal and change the directory to the one of "m3d.dmp" file, then input the following command to import the "m3d.dmp" dataset:

pg_restore -d unibench_m3d -U agens -O -w -v m3d.dmp

Notice: You need to change the current directory into the one of the dataset, otherwise pg_restore cannot find the dataset. You can also specify the path of m3d.dmp in the pg_restore command, for example:

pg_restore -d unibench_m3d -U agens -O -w -v /home/dw/m3d.dmp

Notice: You may see some errors about the pg_dropcache EXTENSION, just ignore these errors because pg_dropcache is only an extension of PostgreSQL for invalidating shared_buffers cache, so it does not affect the imported results.

After importing the dataset, please complete the following 5 questions: Q1-Q5. For each question, you need to write three kinds of queries (FR, NR, and M3D) to answer the same question. Make sure that the three queries for the same question can get the same results. Remember to set the graph_path (SET graph_path = unibench_graph) before performing Cypher queries on the graph.

  • FR: Full-Relational query is based on the FR schema;
  • NR: Non-Relational query is based on the NR schema;
  • M3D: Multi-Model Multidimensional query is based on the M3D schema.

Please read paper 2 and refer to this Github to see the query examples.

Q1: Number of orders by year.

Q2: Number of orders by customer rating for a given product (asin='B005SSWKMK').

Q3: Number of orders by a customer's 2-degree friends (2 hops). Only return the top 5 numbers. The idcust of the customer is 4145.

📗 Statement: In Q3, the FR query will return differnt result from NR and M3D queries. This is becasue the dataset has some problems in building the knows graph.

Q4: Total price by customer for a given vendor (vendorname='Mugen_Motorsports') and period (2018-2020). Only return the top 10 total prices.

Q5: Total price by vendorname for the top 3 customers, order by vendorname.

(Optional) Part3: Bonus of Multi-Model OLAP Analysis (Maximum: 5 points)

In this part, you can get an extra bonus of points by designing at most five different multi-model OLAP questions (1 point for each question) and giving the M3D query for each question.

Requirements:

  • The queries should not be included in Paper 2. This means you cannot simiply modify some parameters of the queries in paper 2. Instead, you need to design queries that have different descriptions and semantics.
  • Each query should be complex enough and consists of at least four different data models (R, JSON, Graph, XML, KV). Simple queries cannot get the points.
  • Please give the description of each query.
  • Please show the involved data models in each query.
  • You ONLY need to give the M3D query format. You DO NOT need to give the FR and NR query formats.

Please refer to Table 1 and Table 2 in Paper 2 (WL1 and WL2 OLAP workload for Unibench) to design the queries.

Submission Requirements:

  • Post the queries and demonstrate the results.
  • Please upload all the solutions in a single PDF file.
  • Please submit to Moodle page and the deadline is May 12th, 2022.

About

This repository contains the instructions of the second hands-on session in our DW&BI course.