AJSTO / GUNB_building_permissions

👷Project with workflow management of building permissions aggregates

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

👨‍💻 Built with

Descripction about project

ℹ️Project info

This project is created to catch information about issued building permits in precisely in territorial units: voivodships, poviats and cadastral units. Getting information from official website of GUNB (General Office of Building Supervision): GUNB.

The aim of the project was to use apache airflow to manage workflows. The workflow is launched every month to collect newly issued permits with detailed information about the permit (building permissions table), collect aggregates for individual units in the last month, 2 and 3 (aggregates table), collect updated data on existing units (unit info), create month-to-month increments (to send the report). Additionally, a task in airflow to send a monthly report to the e-mail address about the updated number of records, the current number of records in the database and data validation (great expectations) in the csv table in terms of:

  • registration number is formed properly,
  • type of permissions in set,
  • category values in categories set,
  • voviodeship names in set,
  • cadastral unit symbol is formed properly,
  • date column is not empty,
  • voivodeship number is in set,

Additionally there was created container with jupyter notebook to create geovisualisations based on folium (which are using a openstreetmap). Created geovisualised report for cadastral districts and also voivodeships for last X months (you can chose 1-3 months back).

🔎 Looker Studio

Link to generated report in looker:

Building permissions given from 2016 to present IMG LOOKER

📧Exemplary e-mail report with GE validation:

E-mail sent on e-mail:

email

Great Expectations report in json format (attached to e-mail):

IMG REPORT1

Increase records report attached to e-mail:

IMG REPORT2

Extensions:

An extension of this project is the repo with FastAPI - GUNB_API

🗒️Database created in BIGQUERY contains tables:

Building permissions detailed info:

'id_uniq', 'numer_ewidencyjny_system', 'numer_ewidencyjny_urzad',
'data_wplywu_wniosku_do_urzedu', 'nazwa_organu', 'wojewodztwo_objekt',
'obiekt_kod_pocztowy', 'miasto', 'terc', 'cecha', 'ulica',
'ulica_dalej', 'nr_domu', 'kategoria', 'nazwa_zam_budowlanego',
'rodzaj_zam_budowlanego', 'kubatura', 'stan', 'jednostki_numer',
'obreb_numer', 'numer_dzialki', 'numer_arkusza_dzialki',
'nazwisko_projektanta', 'imie_projektanta',
'projektant_numer_uprawnien', 'projektant_pozostali'

Clustered by 'jednostki_numer', 'kategoria', 'rodzaj_zam_budowlanego'.

Building permissions aggregates:

'unit_id', 'injection_date' + 'columns for every possible variation of category, type of permissions for last 1, 2, 3 months'

Partitioning by parameter month of year.

Teritorial unit's info:

'unit_type', 'unit_name', 'unit_number'

Clustered by 'unit_type'.

Increase month to month:

'date_of_calc', 'last_1_m', 'last_2_m', 'last_3_m'

⏩ DAG contain tasks:

  • create_dataset (creating Bigquery dataset if not created);
  • Subtask Group creating_bigquery_tables:
    • create_table_units_info;
    • create_table_all_permissions;
    • create_table_aggregates;
    • create_table_increase_info;
  • Subtask Group downloading_and_validate_data:
    • download_building_permissions_data;
    • data validation;
  • Subtask Group updating_project_tables (updating and appending rows to Biquery tables):
    • updating_units_info_table;
    • appending_rows_to_table_of_permissions;
    • generate_aggregates;
    • count_increase_of_given_permissions;
  • sending_mail (sending report via e-mail);
  • removing_files;

IMG DAG

In this project I used PythonOperators and EmailOperator.

📦This project using 6 Docker containers:

  • Container with airflow-init
    • Created to initialize Apache Airflow;
  • Container with airflow-webserver
    • Created GUI to use Apache Airflow;
  • Container with airflow-triggerer
  • Container with airflow-scheduler
    • Created to deal with DAGs;
  • Container with PosgreSQL
    • Created for Airflow using;
  • Container with Jupyter Notebook
    • Notebook created to visualise a data, used libraries:
      • pandas
      • numpy
      • folium
      • sqlalchemy

🌲 Project tree

.
├── Dockerfile # Dockerfile to create image of airflow_extending 
├── README.md
├── dags
│   ├── Bigquery_dag.py # Python script with DAG
│   ├── config.yaml # config file
│   ├── variables.py # python module with project variables
│   ├── schemas.py # python module with bigquery table schemas
│   └── YOUR_JSON_KEY_FROM_BIGQUERY.json # JSON key should be here
├── docker-compose.yaml # Yaml file to create containers by Docker Compose
├── logs # Airflow logs
├── notebook # Folder with jupyter notebook files
│   ├── OPENSTREETMAP_HTML # Folder for saving generated maps
│   ├── POLYGON_SHAPES # Here will be downloaded shp files
│   ├── configuration_files
│   │   ├── config.yaml # config file
│   │   ├── YOUR_JSON_KEY_FROM_BIGQUERY.json # JSON key should be here
│   │   └── requirements.txt # Requirements to create jupyter notebook
│   └── spatial report.ipynb # Jupyter notebook with data visualisation
└──requirements.txt # Requirements to create airflow

🔑 Setup

Set up your gmail account to allow connections from airflow for e-mail raport purpose.

Detalied info about:

Sending Emails using Airflow EmailOperator Simplified: 9 Easy Steps

To run properly this project you should set variables in files:

./dags/config.yaml:

  • MY_PROJECT: # Project name in BIGQUERY
  • MY_DATASET: # Dataset name in BIGQUERY
  • MY_TABLE: # Table name in BIGQUERY
  • TABLE_AGG: # Table name in BIGQUERY with all captured aggregates
  • UNITS_INFO: # Table with names and units id's
  • INCREASE_TABLE: # Table with increase values
  • JSON_KEY_BQ: # JSON key filename, also paste file with JSON key in folder ./dags

./docker-compose.yaml:

Environment:

  • AIRFLOW__SMTP__SMTP_USER: # Your gmail account
  • AIRFLOW__SMTP__SMTP_PASSWORD: # Generated password via info in link
  • AIRFLOW__SMTP__SMTP_MAIL_FROM: # Your gmail account
  • AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT: 'google-cloud-platform://?key_path=%2Fopt%2Fairflow%2Fdags%2FYOUR-JSON-KEY-IN-THIS-FOLDER.json&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform&project=YOUR-PROJECT-ID&num_retries=5'

Database:

  • POSTGRES_USER:
  • POSTGRES_PASSWORD:
  • POSTGRES_DB:

Jupyter:

  • change 'token' name in command:
command: bash -c "conda install fiona pyproj six && pip install -r ./configuration_files/requirements.txt && jupyter lab build --minimize=False && jupyter lab --ServerApp.token=TOKEN_NAME"

⚙️ Run Locally

  • Clone the project
  • Go to the project directory: Type in CLI:
  $ ls

You should see this:

Dockerfile logs README.md notebook dags plugins docker-compose.yaml requirements.txt

Now create image of needed airflow extension:

  $ docker build -t extending_airflow:latest .

When created, to initialize airflow type:

  $ docker-compose up airflow-init 

Next run build of all other images needed:

  $ docker-compose up -d

Now airflow will start working. If you want to stop airflow:

  $ docker-compose down -v

⚙️ Open airflow

When all containers running, open browser and type:

  localhost:8080

Next type password and username.

📊Data visualisation with folium

After run of DAG you should run jupyter notebook. To start notebook you should type in your browser:

  localhost:8888

🚨In case the notebook requires a token pass a value which was assigned to TOKEN_NAME in ./docker-compose.yaml🚨

Next choose a file 🗒️spatial_report.ipynb and run all cells to see data analyse. Example of geovisualisation with folium:

CADASTRAL UNITS in last 3 months:

IMG CADASTRAL

VOIVODSHIPS in last 3 months:

IMG VOIVO

About

👷Project with workflow management of building permissions aggregates


Languages

Language:Python 67.4%Language:Jupyter Notebook 31.4%Language:Dockerfile 1.2%