JekxDevil / dm-spark-tpcxbb

Spark SQL homework assignment 7 based on TPCx-BB for the Data Management course at USI Università della Svizzera italiana

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

dm-spark-tpcxbb

Spark SQL homework based on TPCx-BB for the Data Management course

Get started

  • Read the homework sheet delivery requirements
  • Clone this repository
  • Perform the setup steps to get a clean working environment

Setup

The dependency management of the python3.10 project is done using poetry. It will create a virtual environment and install the dependencies in it.

If you prefer to manually install the dependencies, check out the pyproject.toml.

Here a script to leverage the automated installation of the virtual environment and deps on a Debian-based systems:

# install pipx
sudo apt update             && \
sudo apt install -y pipx    && \
pipx ensurepath             && \

# install poetry
pipx install poetry         && \

# from repository root where pyproject.toml is located
poetry install              && \

# to run the shell within the virtual environment
poetry shell;

Run

To run the Jupyter Lab in order to edit the notebook, from within the repository root run the following command:

jupyter lab

It will open a browser tab with the Jupyter lab interface.

Suggestion: use the same python environment in Pycharm IDE by selecting the python interpreter from the poetry venv1 to benefit from its integration.

Footnotes

  1. https://www.jetbrains.com/help/pycharm/poetry.html#existing-poetry-environment

About

Spark SQL homework assignment 7 based on TPCx-BB for the Data Management course at USI Università della Svizzera italiana


Languages

Language:Shell 78.1%Language:Jupyter Notebook 11.7%Language:Python 10.1%