BrenPatF / shortest_path_sql

Shortest Path Analysis of Large Networks by SQL and PL/SQL

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Shortest Path Analysis of Large Networks by SQL and PL/SQL


This project is on the use of SQL and PL/SQL to solve shortest path problems on large networks on an Oracle database. It provides solutions in pure SQL (based on previous articles by the author), and solutions in PL/SQL with embedded SQL that scale better for larger problems.

It applies the solutions to a range of problems, upto a size of 2,800,309 nodes and 109,262,592 links.

Standard and custom methods for execution time profiling of the code are included, and one of the algorithms implemented in PL/SQL is tuned based on the profiling.

The two PL/SQL entry points have automated unit tests using the Math Function Unit Testing design pattern, Trapit - Oracle PL/SQL unit testing module.


Movie Morsel: Six Degrees of Kevin Bacon

There is a series of mp4 recordings, in the mp4 folder on GitHub, briefly going through the sections of the blog post, which can also be viewed via Twitter:

Recording Tweet
sps_1_overview.mp4 1: Overview
sps_2_shortest_path_problems.mp4 2: Shortest Path Problems
sps_3_two_algorithms.mp4 3: Two Algorithms
sps_4_example_datasets.mp4 4: Example Datasets
sps_5_data_model.mp4 5: Data Model
sps_6_network_paths_by_sql.mp4 6: Network Paths by SQL
sps_7_min_pathfinder.mp4 7: Min Pathfinder
sps_8_subnet_grouper.mp4 8: Subnetwork Grouper
sps_9_code_timing_subnet_grouper.mp4 9: Code Timing Subnet Grouper
sps_10_profiling.mp4 10: Oracle Profilers
sps_11_tuning_1_isolated_nodes.mp4 11: Tuning 1, Isolated Nodes
sps_12_tuning_2_isolated_links.mp4 12: Tuning 2, Isolated Links
sps_13_tuning_3_root_selector.mp4 13: Tuning 3, Root Node Selector
sps_14_unit_testing.mp4 14: Unit Testing

There is also a blog post:

And a presentation, given in Dublin on 5 September 2022 for the 2022 Oracle User Group conference (the .pptxfile is in the root folder):

In this README...

↓ Background
↓ Installation
↓ Running the examples
↓ Running the unit tests
↓ See Also

Background

↑ In this README...

In 2015 I posted a number of articles on network analysis via SQL and array-based PL/SQL, including SQL for Shortest Path Problems 2: A Branch and Bound Approach and PL/SQL Pipelined Function for Network Analysis. These were tested on networks up to a size of 58,228 nodes and 214,078 links, but were not intended for really large networks.

Recently, I came across (by way of an article on using PostgreSQL for network analysis, Using the PostgreSQL Recursive CTE - Part Two, Bryn Llewellyn, March 2021) a set of network datasets published by an American college, Bacon Numbers Datasets from Oberlin College, December 2016. These range in size up to 2,800,309 nodes and 109,262,592 links, and I wondered if I could develop my ideas further to enable solution in Oracle SQL and PL/SQL of the larger networks.

Installation

↑ In this README...
↓ Install 1: Install prerequisite tools (if necessary)
↓ Install 2: Clone git repository
↓ Install 3: Install prerequisite modules
↓ Install 4: Copy the unit test input JSON files to the database server
↓ Install 5: Copy the input data files to the database server

Install 1: Install prerequisite tools (if necessary)

Oracle database

The database installation requires a minimum Oracle version of 12.2 Oracle Database Software Downloads.

Github Desktop

In order to clone the code as a git repository you need to have the git application installed. I recommend Github Desktop UI for managing repositories on windows. This depends on the git application, available here: git downloads, but can also be installed from within Github Desktop, according to these instructions: How to install GitHub Desktop.

nodejs (Javascript backend) [Optional]

nodejs is needed to run a program that turns the unit test output files into formatted HTML pages. It requires no JavaScript knowledge to run the program, and nodejs can be installed here.

Install 2: Clone git repository

↑ Installation

The following steps will download the repository into a folder, shortest_path_sql, within your GitHub root folder:

Install 3: Install prerequisite modules

↑ Installation

The install depends on the prerequisite modules Utils, Trapit, Timer_Set and Oracle's profilers, all installed in lib schemas. The sys install creates the lib schema for the pre-reqquisites and shortest_path_sql for the network code and tables.

The prerequisite modules can be installed by following the instructions for each module at the module root pages listed in the See Also section below. This allows inclusion of the examples and unit tests for those modules. Alternatively, the next section shows how to install these modules directly without their examples or unit tests.

[Schema: sys; Folder: install_prereq] Create lib and app schemas and Oracle directory

install_sys.sql creates an Oracle directory, input_dir, pointing to 'c:\input'. Update this if necessary to a folder on the database server with read/write access for the Oracle OS user

  • Run script from slqplus:
SQL> @install_sys_all

[Schema: lib; Folder: install_prereq\lib] Create lib components

  • Run script from slqplus:
SQL> @install_lib_all

[Schema: shortest_path_sql; Folder: install_prereq\shortest_path_sql] Create shortest_path_sql synonyms

  • Run script from slqplus:
SQL> @c_syns_all

[Folder: (npm root)] Install npm trapit package [Optional - for unit testing]

The npm trapit package is a nodejs package used to format unit test results in text and HTML format.

Open a DOS or Powershell window in the folder where you want to install npm packages, and, with nodejs installed, run

$ npm install trapit

This should install the trapit nodejs package in a subfolder .\node_modules\trapit

Install 4: Copy the unit test input JSON files to the database server [Optional - for unit testing]

↑ Installation

  • Copy the following files from the unit_test\input folders to the server folder pointed to by the Oracle directory INPUT_DIR:

    • tt_shortest_path_sql.purely_wrap_ins_min_tree_links_inp.json
    • tt_shortest_path_sql.purely_wrap_ins_node_roots_inp.json
  • There is also a powershell script to do this, assuming C:\input as INPUT_DIR. From a powershell window in the root folder:

$ ./Copy-JSONToInput.ps1

Install 5: Copy the input data files to the database server

↑ Installation

  • Unzip the following files from the examples\input folders to the server folder pointed to by the Oracle directory INPUT_DIR:

    • brightkite_edges.csv.zip
    • imdb.no_tv_v.txt.zip
    • imdb.only_tv_v.txt.zip
    • imdb.pre1950.txt.zip
    • imdb.small.txt.zip
    • imdb.top250.txt.zip

The files are in Windows format, so conversion may be needed if not using Windows.

  • There is also a powershell script to do this, assuming C:\input as INPUT_DIR. From a powershell window in the root folder:
$ ./Copy-InputDataToInput.ps1

The following files are too big, even zipped to save to GitHub, so these have to be obtained from the original source site (and converted to Windows format, if necessary):

https://www.cs.oberlin.edu/~rhoyle/16f-cs151/lab10/index.html

- imdb.full.txt
- imdb.post1950.txt

Running the examples

↑ In this README...

Each example has a driver sqlplus script, run*.sql, in its own subfolder:

examples
    bacon
        full
        no_tv_v
        only_tv_v
        post1950
        pre1950
        small
        top250
    brightkite
        main
    foreign_keys
        sys_fks
    three_subnets
        main

The driver script calls a setup script that

  • re-installs the base database components
  • loads the example data to the nodes and links tables

The driver script then calls multiple scripts from the examples folder to execute the PL/SQL programs, with various parameter combinations, spoolng to individual files.

For example, the call to the driver for bacon/full is, from its subfolder:

SQL> @run_bacon_full

There is also a powershell script Run-All.ps1 that navigates to each subfolder, logs on and runs the driver script. This takes a long time to run!

Running the unit tests

↑ In this README...

The unit test programs may be run from the unit_test folder (which calls the unit test install script at the start):

SQL> @r_tests

The output results files are processed by a JavaScript program that has to be installed separately, as described above Install npm trapit package. The JavaScript program produces listings of the results in HTML and/or text format in a subfolder named from the unit test title, and the subfolders are included in the folder unit_test\output.

To run the processor, open a powershell window in the npm trapit package folder after placing the output JSON files, tt_shortest_path_sql.purely_wrap_ins_min_tree_links_out.json, tt_shortest_path_sql.purely_wrap_ins_node_roots_out.json in a new (or existing) folder, shortest_path_sql, within the subfolder externals and run:

$ node externals\format-externals shortest_path_sql

This outputs to screen the following summary level report, as well as writing the formatted results files to the subfolders indicated:

Unit Test Results Summary for Folder ./externals/shortest_path_sql
==================================================================
 File                                                          Title                                  Inp Groups  Out Groups  Tests  Fails  Folder                               
-------------------------------------------------------------  -------------------------------------  ----------  ----------  -----  -----  -------------------------------------
 tt_shortest_path_sql.purely_wrap_ins_min_tree_links_out.json  Oracle SQL Shortest Paths: Node Tree            3           2      7      0  oracle-sql-shortest-paths_-node-tree 
 tt_shortest_path_sql.purely_wrap_ins_node_roots_out.json      Oracle SQL Shortest Paths: Node Roots           2           2      3      0  oracle-sql-shortest-paths_-node-roots

0 externals failed, see ./externals/shortest_path_sql for scenario listings

The process has also been automated in a powershell script, Run_Ut.ps1, in folder unit_test, which has the following hard-coded folders:

$userRoot='C:\Users\Brend\OneDrive\Documents\'
$sqlDir = $userRoot + 'GitHub\shortest_path_sql\unit_test\'
$npmDir = $userRoot + 'demo\npm\node_modules\trapit\'
$inputDir = 'C:\input\'

Operating System/Oracle Versions

Windows

Windows 11

Oracle

Oracle Database Version 21.3.0.0.0

See Also

↑ In this README...

License

MIT

About

Shortest Path Analysis of Large Networks by SQL and PL/SQL


Languages

Language:PLSQL 84.0%Language:HTML 14.1%Language:PowerShell 1.8%Language:Batchfile 0.1%