tomasgreif / mondrianr

R Package - generate Mondrian schema for PostgreSQL table or R data frame

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mondrian Schema Generator

Package mondrianr creates Mondrian schema for given table. It's goal is to enable easy analysis of single tables stored in PostgreSQL or R in Mondrian compatible tools. Mondrian is an open source OLAP engine.

mondrianr works well with Saiku - an open source analytical tool for OLAP data. Given Mondrian schema and data source definition file (both generated by mondrianr), Saiku gives you highly interactive environment for data analysis. Data analysis in Saiku is similar to using pivot tables in Excel or Libre Office. To better understand how Saiku can help you analyze data, I highly recommend to try online demo.

Note: When creating traditional Mondrian schema we typically work with one fact table (where measures are stored) and one or more dimension table. For example we can have fact table with sales amount and keys to dimensions (tables) like time, product, customer, ... mondrianr builds schema in such way that one table is both fact table and dimension at the same time. While this is not traditional design, it still complies with Mondrian schema definition.

Main features

  • Generates Mondrian schema for PostgreSQL table or R data frame
  • Generates datasource definition for Saiku
  • Optionally maps date columns to existing time dimension (PostgreSQL)
  • Optionally maps date columns to automatically generated time dimension (R data frame)
  • User can define what aggregators will be used
  • User can define what columns will be used as dimensions

Main functions

  • create_schema() - function to generate mondrian schema

Installation:

Install mondrianr

     library(devtools)
     install_github('mondrianr','tomasgreif')
     library(mondrianr)

Note: mondrianr requires packages sqldf and RSQLite

Install Saiku

Follow this guide. Download SQLite driver and copy it to correct folder.

Schema and data source destination

When using Saiku, file with data source definition points to file with schema definition. All data source definition files have to be in folder [INSTALLATION-FOLDER]/saiku-server/tomcat/webapps/saiku/WEB-INF/classes/saiku-datasources/. Every data source definition file is text file WITHOUT any extension. Cube schema can be stored everywhere where Saiku can access files. When deployed locally, you can place schema file almost anywhere (e.g. your home directory).

Test everything works

For R data frame

library(mondrianr)

data(big_portfolio)
big_portfolio$origination_date <- as.Date(big_portfolio$origination_date)
big_portfolio$repayment_date <- as.Date(big_portfolio$repayment_date,format='%Y-%m-%d')
str(big_portfolio)

create_schema(engine='R',
              table='big_portfolio',
              primary_key='id',
              time_table = 'any_name',
              schema_dest='/destination_folder/test.xml',
              data_source_dest='/destination_folder/test')

Note: do not forget to set correct destinations for schema_dest and data_source_dest.

For PostgreSQL table:

create_schema(engine='PostgreSQL',
               table='public.big_portfolio',
               primary_key='id',
               con=c('usr','pwd','db','host','port'),
               schema_dest='/destination_folder/test.xml',
               data_source_dest='/destination_folder/test')

Note: do not forget to import data to PostgreSQL first. Use valid connection definition (con) and set correct destinations for schema_dest and data_source_dest.

If anything goes well, than you can get something like the following in Saiku:

alt tag

About

R Package - generate Mondrian schema for PostgreSQL table or R data frame


Languages

Language:R 100.0%