ed-o-saurus / PLHaskell

A Haskell extension for PostgreSQL that allows embedded Haskell code.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PL/Haskell

This project is a "procedural language" extension of PostgreSQL allowing the execution of code in Haskell within SQL code. Despite the name, Haskell is, of course, not a procedural language as it is a functional language. However, "procedural language" is the term that PostgreSQL uses to describe languages that can be embedded in its SQL code.

The extension allows users, even unprivileged ones, to write, install, and run functions written in Haskell.

Copyright

Copyright (C) 2023 Edward F. Behn, Jr.

License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Contact

I wrote the project mostly for my own amusement and as a challenge. If you have any questions, comments, or issues, or if you find the project interesting or useful, please don't hesitate to contact me at ed@behn.us.

Bugs and Features

Bug reports and feature requests should be submitted in the Github issues page.

Prerequisites

This document assumes that the reader is familiar with PostgreSQL (specifically procedural languages) and with the Haskell language.

Package Repositories

The easiest way to install the project is to use the RPM package repository for Fedora or the apt package repository for Ubuntu.

Fedora

Add the repository:

$> wget --quiet -O- https://ed-o-saurus.github.io/repos/plhaskell/fedora/plhaskell.repo | sudo tee /etc/yum.repos.d/plhaskell.repo > /dev/null

Update the repository information:

$> sudo dnf update

Install the package:

$> sudo dnf install plhaskell

Ubuntu

Add the signing key to the apt keys:

$> wget --quiet -O- https://ed-o-saurus.github.io/keys/A9DD4516.asc | gpg --dearmor | sudo tee /usr/share/keyrings/plhaskell-keyring.gpg > /dev/null

Add the repository:

$> echo deb \[signed-by=/usr/share/keyrings/plhaskell-keyring.gpg\] https://ed-o-saurus.github.io/repos/plhaskell/ubuntu/$(lsb_release -cs)/apt-repo stable main | sudo tee /etc/apt/sources.list.d/plhaskell.list > /dev/null

Update the repository information:

$> sudo apt update

Install the package:

$> sudo apt install plhaskell

Build and Install

This extension is intended to be built and installed in a Linux environment. It has only been tested on the x86-64 architecture.

Red Hat Package Management

The following should be done as a non-root user.

Install git, rpmdevtools, make, checkpolicy, policycoreutils, postgresql-server-devel, ghc-compiler, ghc-bytestring-devel, ghc-text-devel, and ghc-hint-devel:

$> sudo dnf install git rpmdevtools make selinux-policy-devel postgresql-server-devel ghc-compiler ghc-bytestring-devel ghc-text-devel ghc-hint-devel

Create the directories necessary to build the .rpm:

$> rpmdev-setuptree

Download the project code:

$> git clone https://github.com/ed-o-saurus/PLHaskell

Copy the .spec file to the ~/rpmbuild/SPECS directory:

$> cp PLHaskell/spec/plhaskell.spec ~/rpmbuild/SPECS

Copy the source to the ~/rpmbuild/SOURCES directory:

$> tar --exclude-vcs -czf ~/rpmbuild/SOURCES/PLHaskell.tar.gz PLHaskell

Build the .rpm package:

$> rpmbuild -bb ~/rpmbuild/SPECS/plhaskell.spec

Install the .rpm package written to the ~/rpmbuild/RPMS/<arch> directory on the target machine.

Debian Package Management

The following should be done as a non-root user.

Install git, devscripts, debhelper-compat, postgresql-server-dev-all ghc, libghc-hint-dev:

$> sudo apt install git devscripts debhelper-compat postgresql-server-dev-all ghc libghc-hint-dev

Download the project code:

$> git clone https://github.com/ed-o-saurus/PLHaskell

Build the .deb package:

$> cd PLHaskell

$> debuild --no-tgz-check

Install the .deb package written to the parent directory on the target machine.

Other

The alternative to building an .rpm or .deb package is to build and install the project manually.

The following are needed to build and install PL/Haskell:

  • PostgreSQL server
  • PostgreSQL development files
  • Glasgow Haskell Compiler (GHC)
  • The GHC Hint development module
  • libpq-devel

Build

Ensure that pg_config is available in the path. Run make to build all the files needed for the extension.

Install

To install, run sudo make install.

On each database that you wish to have the extension, run the SQL command CREATE EXTENSION plhaskell;.

Uninstall

To uninstall, run sudo make uninstall.

Security Enhanced Linux

Systems that use Security Enhanced Linux (SELinux) may encounter problems running the extension. This manifests as the ability to create functions and the inability to call them. Appropriate policies must be implemented. Full details are beyond the scope of this document.

A policy file designed to accommodate Red Hat based systems can be built by running make SELINUX=1. Upon installation, this file is saved to /usr/share/selinux/packages/plhaskell.pp. It is the user's responsibility to install the policy.

Testing

Test batteries are provided in the tests_trusted.sql and tests_untrusted.sql files in the root directory. They can be run with the commands

$> psql -f tests_trusted.sql

$> psql -f tests_untrusted.sql

If successful, they will terminate with the notice All tests passed.

Trust

The package provides two language variants. plhaskell is a "trusted" language. The PostgreSQL manual explains:

The optional key word TRUSTED specifies that the language does not grant access to data that the user would not otherwise have. Trusted languages are designed for ordinary database users (those without superuser privilege) and allows them to safely create functions and procedures.

As such, unprivileged users are permitted to write and execute functions without the possibility that they will be able to access information or resources that are not allowed to access. This is accomplished by enforcing Haskell's strong type system.

plhaskellu is an "untrusted" language. It affords the user more flexibility but lacks the safeguards of the trusted variant. As such, only users with PostgreSQL superuser privilege can create functions with this variant.

Usage

To install the language in a database, run the SQL command CREATE EXTENSION plhaskell;.

Functions in PL/Haskell are created the same manner as other PostgreSQL functions. The code must be valid Haskell.

The code must contain a function with the same name as the PostgreSQL function. Each of its arguments must be of the type Maybe arg where arg is the Haskell type as determined by the PostgreSQL type as indicated by the following table.

PostgreSQL Type Module Haskell Type
bytea Data.ByteString ByteString
text Data.Text Text
char Prelude Char
bool Prelude Bool
smallint Data.Int Int16
integer Data.Int Int32
bigint Data.Int Int64
real Prelude Float
float Prelude Double

Trusted functions must return type PGm (Maybe result) where result is the appropriate Haskell type as determined by the return type of function while untrusted functions must return type IO (Maybe result). The PGm monad type can be imported from the PGutils module.

In addition, functions can use composite types as arguments or return values provided that the composite types consist of elements that are listed in the table above or are themselves composite types. Composite values are represented as Haskell tuples.

Returning Sets

Functions can return sets of values by returning type PGm [Maybe result] or IO [Maybe result] where result is the appropriate Haskell type as determined by the return type of function.

Reporting Messages and Raising Error

To report a message or raise an error, use the function report :: ErrorLevel -> Text -> PGm (). ErrorLevel is any of the following:

Level
exception
warning
notice
info
log
debug1
debug2
debug3
debug4
debug5

In addition, the function raiseError :: Text -> a stops execution and raises an error.

Queries

Queries can be performed from inside PL/Haskell with the function query :: Text -> [QueryParam] -> PGm QueryResults. The first argument is the query itself. The second argument is a list of values to be substituted into the query using the expressions $n. This should be used rather than constructing a query string directly to prevent SQL injection attacks.

Queries in stable and immutable functions are executed in read-only mode, while volatile functions execute queries in read-write mode.

Any base type that can be passed to PL/Haskell functions can be used as a parameter by using the appropriate constructor for the type QueryParam. The constructors are the following:

QueryParam
QueryParamByteA (Maybe ByteString)
QueryParamText (Maybe Text)
QueryParamChar (Maybe Char)
QueryParamBool (Maybe Bool)
QueryParamInt2 (Maybe Int16)
QueryParamInt4 (Maybe Int32)
QueryParamInt8 (Maybe Int64)
QueryParamFloat4 (Maybe Float)
QueryParamFloat8 (Maybe Double)

The constructors for QueryResults are the following:

QueryResults
SelectResults Word64 [Text] [[QueryResultValue]]
SelectIntoResults Word64
InsertResults Word64
DeleteResults Word64
UpdateResults Word64
InsertReturningResults Word64 [Text] [[QueryResultValue]]
DeleteReturningResults Word64 [Text] [[QueryResultValue]]
UpdateReturningResults Word64 [Text] [[QueryResultValue]]
UtilityResults Word64
RewrittenResults Word64

The constructor indicates the type of query run. The Word64 field is the number of rows processed. The [Text] field is the names of the columns returned and the [[QueryResultValue]] is the data returned.

The constructors for QueryResultValue are the following:

QueryResultValue
QueryResultValueByteA (Maybe ByteString)
QueryResultValueText (Maybe Text)
QueryResultValueChar (Maybe Char)
QueryResultValueBool (Maybe Bool)
QueryResultValueInt2 (Maybe Int16)
QueryResultValueInt4 (Maybe Int32)
QueryResultValueInt8 (Maybe Int64)
QueryResultValueFloat4 (Maybe Float)
QueryResultValueFloat8 (Maybe Double)
QueryResultValueComposite (Maybe [QueryResultValue])

Converting PGm to IO

The function unPGm :: PGm a -> IO a from the PGutils module can be used to convert a PGm action to an IO action.

Maximum Memory

By default, the maximum memory that can be used by the Haskell runtime system is 128 MB. This can be changed by setting the plhaskell.max_memory variable in the postgresql.conf file.

GHC Version

The function ghc_version takes no arguments and returns the version of the underlying GHC API. Values are:

  • 804 for GHC 8.4.x
  • 806 for GHC 8.6.x
  • etc...

Examples

Addition

CREATE FUNCTION add(int, int) RETURNS int IMMUTABLE AS
$$
    import PGutils (PGm)
    import Data.Int (Int32)

    add :: Maybe Int32 -> Maybe Int32 -> PGm (Maybe Int32)

    add Nothing Nothing = return Nothing
    add (Just a) (Just b) = return (Just (a+b))
    add a Nothing = return a
    add Nothing b = return b
$$
LANGUAGE plhaskell;

Fibonacci Sequence

CREATE FUNCTION fibonacci(int) RETURNS int IMMUTABLE AS
$$
    import PGutils (PGm)
    import Data.Int (Int32)

    fibonacci' :: Int32 -> Int32
    fibonacci' 0 = 0
    fibonacci' 1 = 1
    fibonacci' n = fibonacci' (n-2) + fibonacci' (n-1)

    fibonacci :: Maybe Int32 -> PGm (Maybe Int32)
    fibonacci Nothing = return Nothing
    fibonacci (Just n) = return (Just (fibonacci' n))
$$
LANGUAGE plhaskell;

Primes

This section shows how to return a set of results. The functions listed produce lists of prime numbers using the Sieve of Eratosthenes.

Composite Type

The function in this section returns a set of composite results.

CREATE TYPE n_p AS (n int, p int);

CREATE FUNCTION primes(int) RETURNS SETOF n_p IMMUTABLE AS
$$
    import PGutils (PGm, raiseError)
    import Data.Int (Int32)

    sieve :: [Int32] -> [Int32]
    sieve (p:xs) = p : sieve [x | x <- xs, x `mod` p /= 0]
    sieve [] = []

    primes :: Maybe Int32 -> PGm [Maybe (Maybe Int32, Maybe Int32)]
    primes Nothing = raiseError "Invalid Null"
    primes (Just n) = return (map Just (zip [Just i | i <- [1..n]] (map Just (sieve [2..]))))
$$
LANGUAGE plhaskell;

By running the query

SELECT *
FROM primes(10)

The following is produced

n p
1 2
2 3
3 5
4 7
5 11
6 13
7 17
8 19
9 23
10 29

Infinite List

The following function returns a infinite list of prime numbers

CREATE FUNCTION primes() RETURNS SETOF int IMMUTABLE AS
$$
    import PGutils (PGm)
    import Data.Int (Int32)

    sieve :: [Int32] -> [Int32]
    sieve (p:xs) = p : sieve [x | x <- xs, x `mod` p /= 0]

    primes :: PGm [Maybe Int32]
    primes = return (map Just (sieve [2..]))
$$
LANGUAGE plhaskell;

To generate the first twenty-five prime numbers, run:

SELECT primes()
LIMIT 25
primes
2
3
5
7
11
13
17
19
23
29
31
37
41
43
47
53
59
61
67
71
73
79
83
89
97

Message

The following demonstrate how to show a notice from within a function.

CREATE FUNCTION forty_two() RETURNS int IMMUTABLE AS
$$
    import PGutils (PGm, report, notice)
    import Data.Int (Int32)

    forty_two :: PGm (Maybe Int32)
    forty_two = do
        report notice "Don't Panic"
        return (Just 42)
$$
LANGUAGE plhaskell;
CREATE FUNCTION forty_two() RETURNS int IMMUTABLE AS
$$
    import PGutils (unPGm, report, notice)
    import Data.Int (Int32)

    forty_two :: IO (Maybe Int32)
    forty_two = do
        unPGm $ report notice "Don't Panic"
        return (Just 42)
$$
LANGUAGE plhaskellu;

Query

The following delete all elements of table t and return the number of rows removed.

CREATE FUNCTION remove_all() RETURNS bigint VOLATILE AS
$$
    import PGutils (PGm, query, QueryResults (DeleteResults))
    import Data.Int (Int64)

    remove_all :: PGm (Maybe Int64)
    remove_all = do
        DeleteResults processed <- query "DELETE FROM t" []
        return (Just (fromIntegral processed))
$$
LANGUAGE plhaskell;
CREATE FUNCTION remove_all() RETURNS bigint VOLATILE AS
$$
    import PGutils (unPGm, query, QueryResults (DeleteResults))
    import Data.Int (Int64)

    remove_all :: IO (Maybe Int64)
    remove_all = do
        DeleteResults processed <- unPGm $ query "DELETE FROM t" []
        return (Just (fromIntegral processed))
$$
LANGUAGE plhaskellu;

The following return the last names of students with the passed first name.

CREATE FUNCTION last_names(text) RETURNS SETOF text IMMUTABLE AS
$$
    import PGutils (PGm, query, QueryResults (SelectResults), QueryParam (QueryParamText), QueryResultValue (QueryResultValueText))
    import Data.Text (Text)

    extract_text :: [QueryResultValue] -> Maybe Text
    extract_text [QueryResultValueText name] = name

    last_names :: Maybe Text -> PGm [Maybe Text]
    last_names first_name = do
        SelectResults _processed _header results <- query "SELECT last_name FROM students WHERE first_name = $1" [QueryParamText first_name]
        return (map extract_text results)
$$
LANGUAGE plhaskell;
CREATE FUNCTION last_names(text) RETURNS SETOF text IMMUTABLE AS
$$
    import PGutils (unPGm, query, QueryResults (SelectResults), QueryParam (QueryParamText), QueryResultValue (QueryResultValueText))
    import Data.Text (Text)

    extract_text :: [QueryResultValue] -> Maybe Text
    extract_text [QueryResultValueText name] = name

    last_names :: Maybe Text -> IO [Maybe Text]
    last_names first_name = do
        SelectResults _processed _header results <- unPGm $ query "SELECT last_name FROM students WHERE first_name = $1" [QueryParamText first_name]
        return (map extract_text results)
$$
LANGUAGE plhaskellu;

About

A Haskell extension for PostgreSQL that allows embedded Haskell code.

License:GNU General Public License v3.0


Languages

Language:PLpgSQL 46.5%Language:Haskell 28.4%Language:C 22.2%Language:Makefile 2.4%Language:Shell 0.5%