jonixis / docker-postgres-madlib

Docker image for PostgreSQL with MADlib extension installed

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

postgres-madlib

This image provides a postgres database (11.5) with the extension MADlib installed. The instance contains preloaded datasets.

Automated builds are available on dockerhub

Installation

Prerequisites

Connect to database

docker-compose.yml

version: '3.7'

services:
  postgres:
    image: jonixis/postgres-madlib:latest
    container_name: postgres-madlib
    ports:
      - "5432:5432"
    restart: unless-stopped
    volumes:
      - data:/var/lib/postgresql/data

volumes:
  data:

Start the container. Create file 'docker-compose.yml' and run command in same directory (automatically pulls image from dockerhub).

sudo docker-compose up -d

Connect to postgres from the host with e.g. pgcli.

pgcli -h localhost -U postgres -d postgres

Data

The dataset is automatically loaded on the first startup of the container. The database is persisted in a docker volume. Therefore, it lives on even if the container is deleted. If you want to reset the database simply remove container, delete volume and start container again:

Stop and remove container.

sudo docker-compose down

Delete docker volume with postgres data.

sudo docker volume rm docker-postgres-madlib_data

Start container.

sudo docker-compose up -d

Example Query

To run a logistic regression on the admission table, run the following queries.

Convert 'chance_of_admit' to integer (0 or 1).

ALTER TABLE admission ALTER COLUMN chance_of_admit TYPE integer;

Train:

DROP TABLE IF EXISTS admission_logregr, admission_logregr_summary;
SELECT logregr_train( 'admission',                                                                       -- Source table
                      'admission_logregr',                                                               -- Output table
                      'chance_of_admit',                                                                 -- Dependent variable
                      'ARRAY[1, gre_score, toefl_score, university_rating, sop, lor, cgpa, research]',   -- Feature vector
                      NULL,                                                                              -- Grouping
                      20,                                                                                -- Max iterations
                      'irls'                                                                             -- Optimizer to use
                    );

Predict:

-- Display prediction value along with the original value
SELECT a.serial_no, logregr_predict(coef, ARRAY[1, gre_score, toefl_score, university_rating, sop, lor, cgpa, research]),
       a.chance_of_admit::BOOLEAN
FROM admission a, admission_logregr m
ORDER BY a.serial_no;

MADlib Documentation for Logistic Regression: https://madlib.apache.org/docs/latest/group__grp__logreg.html#examples


Source dataset admission_table.csv: https://www.kaggle.com/mohansacharya/graduate-admissions

Source dataset rhc.csv: http://biostat.mc.vanderbilt.edu/wiki/Main/DataSets

About

Docker image for PostgreSQL with MADlib extension installed


Languages

Language:Dockerfile 67.0%Language:TSQL 23.2%Language:Shell 9.8%