Bonis98 / Distributed-query

Cybersecurity master's thesis project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Assignment

This is a project made for the master's thesis in my cybersecurity degree. The project consists in the implementation of a variant of the algorithm proposed in Distributed Query Execution under Access Restrictions

Installation and run

  1. Clone this Repo
  2. cd into the project root folder, and run python3.9 -m venv env to create a virtual env
  3. Run source env/bin/activate to activate the virtual env
  4. Run pip install -r requirements.txt to install all the packages needed to run the project
    • If you don't have pip installed, you can find informations here
  5. The script has five command line arguments:
    • -p PATH, --path PATH: representing the path where to save the pdf containing the tree resulting from the computation (e.g. '../' to save the pdf in the directory containing the script folder)
    • -m ASSIGNMENT, --manual ASSIGNMENT: Manually assign node to candidate, in the form 'XYZ' to assign them to nodes in pre-order visit of the query tree plan
    • -i INPUT, --input INPUT: Path from where take the input of the algorithm
    • -v, --verbose: Enables verbose logging
    • -d, --debug: Enables debugging loggin

back

Input data to algorithm

Inputs to the algorithm are given by four different CSV files:

Folder CSV_data contains an example of that files

relations.csv

This is the file modeling the base relations of the query, structured as follows:

  • name: Name of the base relation
  • primary_key: Primary key of the relation
  • provider: Storage provider storing the relationship
  • plain_attr: Attributes of the relationship stored in plain text
  • enc_attr: Attributes of the relationship stored encrypted
  • attr: All the attributes of the relationship (plain and enc)
  • enc_costs: list of (semi colon separated) encryption costs of attributes in the relationship
  • dec_costs: list of (semi colon separated) decryption costs of attributes in the relationship
  • size: list of (semi colon separated) attributes sizes (used to estimate computational cost of node)
  • node_id: id of the leaf node in the tree to which associate the base relation (see nodes.csv)

Parsing of relations.csv produces the following two base relations:

  1. relation flight(NDPC) assigned to storage provider F
    • N is stored encrypted, it has an encryption cost of 1, a decryption cost of 4 and a size of 7
    • D is stored in plain text, it has an encryption cost of 2, a decryption cost of 5 and a size of 8
    • P is stored in plain text, it has an encryption cost of 3, a decryption cost of 6 and a size of 9
    • C is stored encrypted, it has an encryption cost of 4, a decryption cost of 7 and a size of 9
  2. relation company(SJI) assigned to storage provider C
    • S is stored encrypted, it has an encryption cost of 1, a decryption cost of 4 and a size of 7
    • J is stored in plain text, it has an encryption cost of 2, a decryption cost of 5 and a size of 8
    • I is stored encrypted, it has an encryption cost of 3, a decryption cost of 6 and a size of 9

tree.csv

This is the file modeling the query tree plan, structured as follows:

  • ID: Used to associate a relationship with a leaf node
  • operation: Operation of the query associated with the node
    • Projection
    • Selection
    • Cartesian
      • Cartesian product requires to put manually attributes in Ap, Ae and enc_attr taken from children
    • Join
    • Group-by
    • Encryption
    • Decryption
    • Re-encryption
  • Ap: Set of attributes that need to be in plaintext to evaluate the operation associated with the node
  • Ae: Set of attributes that need to be re-encrypted to evaluate the operation associated with the node
  • As: Remaining set of attributes
  • print_label: label of the node to print when tree is exported
  • group_attr: if the operation associated with the node is a group-by, this is the set of attributes on which the group-by clause is evaluated
  • parent: parent node of current node, used to build the tree

Parsing of tree.csv produces the following tree:

nodes

subjects.csv

This is the file modeling the subjects involved in query computation with its authorizations, structured as follows:

  • subject: Name of the subject
  • comp_price: computational price of the subject
  • transfer_price: transfer price of the subject

Parsing of subjects.csv produces the following subjects:

  • U with computational price 1 and transfer price 1
  • X with computational price 2 and transfer price 2
  • Y with computational price 3 and transfer price 3
  • Z with computational price 4 and transfer price 4
  • F with computational price 5 and transfer price 6
  • C with computational price 6 and transfer price 7

authorizations.csv

This is the file modeling the authorizations involved in query computation, structured as follows:

  • subject: Name of the subject (already specified in subjects.csv)
  • plain: Attributes for which the subject is authorized to view in plaintext form
  • enc: Attributes for which the subject is authorized to view in encrypted form

Parsing of authorizations.csv produces the following authorizations:

  • [NCPSJI,-]→U
  • [PC,NSJI]→X
  • [DPJI,CNS]→Y
  • [NCS,PJI]→Z
  • [-,NDPCJI]→F
  • [-,NCSJI]→C

back

About

Cybersecurity master's thesis project


Languages

Language:Python 100.0%