This project is on the use of SQL and PL/SQL to solve shortest path problems on large networks on an Oracle database. It provides solutions in pure SQL (based on previous articles by the author), and solutions in PL/SQL with embedded SQL that scale better for larger problems.
It applies the solutions to a range of problems, upto a size of 2,800,309 nodes and 109,262,592 links.
Standard and custom methods for execution time profiling of the code are included, and one of the algorithms implemented in PL/SQL is tuned based on the profiling.
The two PL/SQL entry points have automated unit tests using the Math Function Unit Testing design pattern, Trapit - Oracle PL/SQL unit testing module.
Movie Morsel: Six Degrees of Kevin Bacon
There is a series of mp4 recordings, in the mp4 folder on GitHub, briefly going through the sections of the blog post, which can also be viewed via Twitter:
Recording | Tweet |
---|---|
sps_1_overview.mp4 | 1: Overview |
sps_2_shortest_path_problems.mp4 | 2: Shortest Path Problems |
sps_3_two_algorithms.mp4 | 3: Two Algorithms |
sps_4_example_datasets.mp4 | 4: Example Datasets |
sps_5_data_model.mp4 | 5: Data Model |
sps_6_network_paths_by_sql.mp4 | 6: Network Paths by SQL |
sps_7_min_pathfinder.mp4 | 7: Min Pathfinder |
sps_8_subnet_grouper.mp4 | 8: Subnetwork Grouper |
sps_9_code_timing_subnet_grouper.mp4 | 9: Code Timing Subnet Grouper |
sps_10_profiling.mp4 | 10: Oracle Profilers |
sps_11_tuning_1_isolated_nodes.mp4 | 11: Tuning 1, Isolated Nodes |
sps_12_tuning_2_isolated_links.mp4 | 12: Tuning 2, Isolated Links |
sps_13_tuning_3_root_selector.mp4 | 13: Tuning 3, Root Node Selector |
sps_14_unit_testing.mp4 | 14: Unit Testing |
There is also a blog post:
And a presentation, given in Dublin on 5 September 2022 for the 2022 Oracle User Group conference (the .pptxfile is in the root folder):
↓ Background
↓ Installation
↓ Running the examples
↓ Running the unit tests
↓ See Also
In 2015 I posted a number of articles on network analysis via SQL and array-based PL/SQL, including SQL for Shortest Path Problems 2: A Branch and Bound Approach and PL/SQL Pipelined Function for Network Analysis. These were tested on networks up to a size of 58,228 nodes and 214,078 links, but were not intended for really large networks.
Recently, I came across (by way of an article on using PostgreSQL for network analysis, Using the PostgreSQL Recursive CTE - Part Two, Bryn Llewellyn, March 2021) a set of network datasets published by an American college, Bacon Numbers Datasets from Oberlin College, December 2016. These range in size up to 2,800,309 nodes and 109,262,592 links, and I wondered if I could develop my ideas further to enable solution in Oracle SQL and PL/SQL of the larger networks.
↑ In this README...
↓ Install 1: Install prerequisite tools (if necessary)
↓ Install 2: Clone git repository
↓ Install 3: Install prerequisite modules
↓ Install 4: Copy the unit test input JSON files to the database server
↓ Install 5: Copy the input data files to the database server
The database installation requires a minimum Oracle version of 12.2 Oracle Database Software Downloads.
In order to clone the code as a git repository you need to have the git application installed. I recommend Github Desktop UI for managing repositories on windows. This depends on the git application, available here: git downloads, but can also be installed from within Github Desktop, according to these instructions: How to install GitHub Desktop.
nodejs is needed to run a program that turns the unit test output files into formatted HTML pages. It requires no JavaScript knowledge to run the program, and nodejs can be installed here.
The following steps will download the repository into a folder, shortest_path_sql, within your GitHub root folder:
- Open Github desktop and click [File/Clone repository...]
- Paste into the url field on the URL tab: https://github.com/BrenPatF/shortest_path_sql.git
- Choose local path as folder where you want your GitHub root to be
- Click [Clone]
The install depends on the prerequisite modules Utils, Trapit, Timer_Set and Oracle's profilers, all installed in lib
schemas. The sys install creates the lib schema for the pre-reqquisites and shortest_path_sql for the network code and tables.
The prerequisite modules can be installed by following the instructions for each module at the module root pages listed in the See Also
section below. This allows inclusion of the examples and unit tests for those modules. Alternatively, the next section shows how to install these modules directly without their examples or unit tests.
install_sys.sql creates an Oracle directory, input_dir
, pointing to 'c:\input'. Update this if necessary to a folder on the database server with read/write access for the Oracle OS user
- Run script from slqplus:
SQL> @install_sys_all
- Run script from slqplus:
SQL> @install_lib_all
[Schema: shortest_path_sql; Folder: install_prereq\shortest_path_sql] Create shortest_path_sql synonyms
- Run script from slqplus:
SQL> @c_syns_all
The npm trapit package is a nodejs package used to format unit test results in text and HTML format.
Open a DOS or Powershell window in the folder where you want to install npm packages, and, with nodejs installed, run
$ npm install trapit
This should install the trapit nodejs package in a subfolder .\node_modules\trapit
-
Copy the following files from the unit_test\input folders to the server folder pointed to by the Oracle directory INPUT_DIR:
- tt_shortest_path_sql.purely_wrap_ins_min_tree_links_inp.json
- tt_shortest_path_sql.purely_wrap_ins_node_roots_inp.json
-
There is also a powershell script to do this, assuming C:\input as INPUT_DIR. From a powershell window in the root folder:
$ ./Copy-JSONToInput.ps1
-
Unzip the following files from the examples\input folders to the server folder pointed to by the Oracle directory INPUT_DIR:
- brightkite_edges.csv.zip
- imdb.no_tv_v.txt.zip
- imdb.only_tv_v.txt.zip
- imdb.pre1950.txt.zip
- imdb.small.txt.zip
- imdb.top250.txt.zip
The files are in Windows format, so conversion may be needed if not using Windows.
- There is also a powershell script to do this, assuming C:\input as INPUT_DIR. From a powershell window in the root folder:
$ ./Copy-InputDataToInput.ps1
The following files are too big, even zipped to save to GitHub, so these have to be obtained from the original source site (and converted to Windows format, if necessary):
https://www.cs.oberlin.edu/~rhoyle/16f-cs151/lab10/index.html
- imdb.full.txt
- imdb.post1950.txt
Each example has a driver sqlplus script, run*.sql, in its own subfolder:
examples
bacon
full
no_tv_v
only_tv_v
post1950
pre1950
small
top250
brightkite
main
foreign_keys
sys_fks
three_subnets
main
The driver script calls a setup script that
- re-installs the base database components
- loads the example data to the nodes and links tables
The driver script then calls multiple scripts from the examples folder to execute the PL/SQL programs, with various parameter combinations, spoolng to individual files.
For example, the call to the driver for bacon/full is, from its subfolder:
SQL> @run_bacon_full
There is also a powershell script Run-All.ps1 that navigates to each subfolder, logs on and runs the driver script. This takes a long time to run!
The unit test programs may be run from the unit_test folder (which calls the unit test install script at the start):
SQL> @r_tests
The output results files are processed by a JavaScript program that has to be installed separately, as described above Install npm trapit package. The JavaScript program produces listings of the results in HTML and/or text format in a subfolder named from the unit test title, and the subfolders are included in the folder unit_test\output
.
To run the processor, open a powershell window in the npm trapit package folder after placing the output JSON files, tt_shortest_path_sql.purely_wrap_ins_min_tree_links_out.json, tt_shortest_path_sql.purely_wrap_ins_node_roots_out.json in a new (or existing) folder, shortest_path_sql, within the subfolder externals and run:
$ node externals\format-externals shortest_path_sql
This outputs to screen the following summary level report, as well as writing the formatted results files to the subfolders indicated:
Unit Test Results Summary for Folder ./externals/shortest_path_sql
==================================================================
File Title Inp Groups Out Groups Tests Fails Folder
------------------------------------------------------------- ------------------------------------- ---------- ---------- ----- ----- -------------------------------------
tt_shortest_path_sql.purely_wrap_ins_min_tree_links_out.json Oracle SQL Shortest Paths: Node Tree 3 2 7 0 oracle-sql-shortest-paths_-node-tree
tt_shortest_path_sql.purely_wrap_ins_node_roots_out.json Oracle SQL Shortest Paths: Node Roots 2 2 3 0 oracle-sql-shortest-paths_-node-roots
0 externals failed, see ./externals/shortest_path_sql for scenario listings
The process has also been automated in a powershell script, Run_Ut.ps1, in folder unit_test, which has the following hard-coded folders:
$userRoot='C:\Users\Brend\OneDrive\Documents\'
$sqlDir = $userRoot + 'GitHub\shortest_path_sql\unit_test\'
$npmDir = $userRoot + 'demo\npm\node_modules\trapit\'
$inputDir = 'C:\input\'
Windows 11
Oracle Database Version 21.3.0.0.0
- Blog: Shortest Path Analysis of Large Networks by SQL and PL/SQL
- The Math Function Unit Testing design pattern, implemented in nodejs
- Trapit - Oracle PL/SQL unit testing module
- Utils - Oracle PL/SQL general utilities module
- Timer_Set - Oracle PL/SQL code timing module
- Powershell utilities module
MIT