roschaefer / chronomodel

Temporal PostgreSQL (9.3+) system with "flashback" framework for ActiveRecord (4.2)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Temporal database system on PostgreSQL using updatable views, table inheritance and INSTEAD OF triggers.

Build Status Dependency Status Code Climate Test Coverage Inlinedocs

{chronos - greek god of time}

Chronos, the greek god of time. Courtesy of REBELLE SOCIETY

ChronoModel implements what Oracle sells as "Flashback Queries", with standard SQL on free PostgreSQL. Academically speaking, ChronoModel implements a Type-2 Slowly-Changing Dimension with history tables.

All history keeping happens inside the database system, freeing application code from having to deal with it. ChronoModel implements all the required features in Ruby on Rails' ORM to leverage the database temporal structure beneath.

Design

The application model is backed by an updatable view in the default public schema that behaves like a plain table to any database client. When data in manipulated on it, INSTEAD OF triggers redirect the manipulations to concrete tables.

Current data is held in a table in the temporal schema, while History is held in a table in the history schema that inherits from the Current one, to get automated schema updates for free and other benefits.

The current time is taken using current_timestamp, so that multiple data manipulations in the same transaction on the same records always create a single history entry (they are squashed together).

Partitioning of history is also possible: this design fits the requirements but it's not implemented yet.

See README.sql for a SQL example defining the machinery for a simple table.

Active Record integration

All Active Record schema migration statements are decorated with code that handles the temporal structure by e.g. keeping the triggers in sync or dropping/recreating it when required by your migrations.

Data extraction at a single point in time and even JOINs between temporal and non-temporal data is implemented using sub-selects and a WHERE generated by the provided TimeMachine module to be included in your models.

The WHERE is optimized using GiST indexes on the tsrange defining record validity. Overlapping history is prevented through exclusion constraints and the btree_gist extension.

All timestamps are forcibly stored in as UTC, bypassing the default_timezone setting.

Requirements

  • Ruby >= 2.2 (1.9 to 2.1 could still work, but aren't supported).
  • Active Record >= 4.2. See the detailed supported versions matrix on travis
  • PostgreSQL >= 9.3
  • The btree_gist PostgreSQL extension
  • The plpython PostgreSQL extension if you have JSON (not JSONB) columns

With Homebrew:

brew install --with-python postgres

With Apt:

apt-get install postgresql-plpython

Installation

Add this line to your application's Gemfile:

gem 'chrono_model'

And then execute:

$ bundle

Configuration

Configure your config/database.yml to use the chronomodel adapter:

development:
  adapter: chronomodel
  username: ...

Schema creation

ChronoModel hooks all ActiveRecord::Migration methods to make them temporal aware.

create_table :countries, temporal: true do |t|
  t.string     :common_name
  t.references :currency
  # ...
end

This creates the temporal table, its inherited history one the public view and all the trigger machinery. Every other housekeeping of the temporal structure is handled behind the scenes by the other schema statements. E.g.:

  • rename_table - renames tables, views, sequences, indexes and triggers
  • drop_table - drops the temporal table and all dependant objects
  • add_column - adds the column to the current table and updates triggers
  • rename_column - renames the current table column and updates the triggers
  • remove_column - removes the current table column and updates the triggers
  • add_index - creates the index in both temporal and history tables
  • remove_index - removes the index from both tables

Adding Temporal extensions to an existing table

Use change_table:

change_table :your_table, temporal: true

If you want to also set up the history from your current data:

change_table :your_table, temporal: true, copy_data: true

This will create an history record for each record in your table, setting its validity from midnight, January 1st, 1 CE. You can set a specific validity with the :validity option:

change_table :your_table, :temporal => true, :copy_data => true, :validity => '1977-01-01'

Please note that change_table requires you to use old_style up and down migrations. It cannot work with Rails 3-style change migrations.

Selective Journaling

By default UPDATEs only to the updated_at field are not recorded in the history.

You can also choose which fields are to be journaled, passing the following options to create_table:

  • :journal => %w( fld1 fld2 .. .. ) - record changes in the history only when changing specified fields
  • :no_journal => %w( fld1 fld2 .. ) - do not record changes to the specified fields
  • :full_journal => true - record changes to all fields, including updated_at.

These options are stored as JSON in the COMMENT area of the public view, alongside with the ChronoModel version that created them.

This is visible in psql if you issue a \d+. Example after a test run:

chronomodel=# \d+
                                                       List of relations
 Schema |     Name      |   Type   |    Owner    |    Size    |                           Description
--------+---------------+----------+-------------+------------+-----------------------------------------------------------------
 public | bars          | view     | chronomodel | 0 bytes    | {"temporal":true,"chronomodel":"0.7.0.alpha"}
 public | foos          | view     | chronomodel | 0 bytes    | {"temporal":true,"chronomodel":"0.7.0.alpha"}
 public | plains        | table    | chronomodel | 0 bytes    |
 public | test_table    | view     | chronomodel | 0 bytes    | {"temporal":true,"journal":["foo"],"chronomodel":"0.7.0.alpha"}

Data querying

Include the ChronoModel::TimeMachine module in your model.

module Country < ActiveRecord::Base
  include ChronoModel::TimeMachine

  has_many :compositions
end

This will create a Country::History model inherited from Country, and add an as_of class method.

Country.as_of(1.year.ago)

Will execute:

SELECT "countries".* FROM (
  SELECT "history"."countries".* FROM "history"."countries"
  WHERE '#{1.year.ago}' <@ "history"."countries"."validity"
) AS "countries"

The returned ActiveRecord::Relation will then hold and pass along the timestamp given to the first .as_of() call to queries on associated entities. E.g.:

Country.as_of(1.year.ago).first.compositions

Will execute:

SELECT "countries".*, '#{1.year.ago}' AS as_of_time FROM (
  SELECT "history"."countries".* FROM "history"."countries"
  WHERE '#{1.year.ago}' <@ "history"."countries"."validity"
) AS "countries" LIMIT 1

and then, using the above fetched as_of_time timestamp, expand to:

SELECT * FROM  (
  SELECT "history"."compositions".* FROM "history"."compositions"
  WHERE '#{as_of_time}' <@ "history"."compositions"."validity"
) AS "compositions" WHERE country_id = X

.joins works as well:

Country.as_of(1.month.ago).joins(:compositions)

Expands to:

SELECT "countries".* FROM (
  SELECT "history"."countries".* FROM "history"."countries"
  WHERE '#{1.month.ago}' <@ "history"."countries"."validity"
) AS "countries" INNER JOIN (
  SELECT "history"."compositions".* FROM "history"."compositions"
  WHERE '#{1.month.ago}' <@ "history"."compositions"."validity"
) AS "compositions" ON compositions.country_id = countries.id

More methods are provided, see the TimeMachine source for more information.

History manipulation

History objects can be changed and .saved just like any other record.

Running tests

You need a running PostgreSQL >= 9.3 instance. Create spec/config.yml with the connection authentication details (use spec/config.yml.example as template).

You need to connect as a database superuser, because specs need to create the btree_gist extension.

Run rake. SQL queries are logged to spec/debug.log. If you want to see them in your output, use rake VERBOSE=true.

Usage with JSON (not JSONB) columns

JSON does not provide an equality operator. As both unnecessary update suppression and selective journaling require comparing the OLD and NEW rows fields, this fails by default.

ChronoModel provides a naive JSON equality operator using a naive comparison of JSON objects implemented in pl/python.

To load the opclass you can use the ChronoModel::Json.create convenience method. If you don't use JSON don't bother doing this.

If you are on Postgres 9.4, you are strongly encouraged to use JSONB, as it has an equality operator built-in, it's faster and stricter, and offers many more indexing abilities and better performance than JSON.

Caveats

  • Rails 4 support requires disabling tsrange parsing support, as it is broken and incomplete as of now, mainly due to a design clash with ruby.

  • The triggers and temporal indexes cannot be saved in schema.rb. The AR schema dumper is quite basic, and it isn't (currently) extensible. As we're using many database-specific features, Chronomodel forces the usage of the :sql schema dumper, and included rake tasks override db:schema:dump and db:schema:load to do db:structure:dump and db:structure:load. Two helper tasks are also added, db:data:dump and db:data:load.

  • The choice of using subqueries instead of [Common Table Expressions] pg-ctes was dictated by the fact that CTEs currently act as an optimization fence. If it will be possible to opt-out of the fence in the future, they will be probably be used again as they were in the past, because the resulting queries were more readable, and do not inhibit using .from() on the AR::Relation.

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Added some great feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

Special mention

An special mention has to be made to Paolo Zaccagnini for all his effort in highlighting the improvements and best decisions taken over the life cycle of the design and implementation of Chronomodel while using it in many important projects.

Denominazione d'Origine Controllata

This software is Made in Italy 🇮🇹 😄.

About

Temporal PostgreSQL (9.3+) system with "flashback" framework for ActiveRecord (4.2)

License:MIT License


Languages

Language:Ruby 98.7%Language:PLpgSQL 1.3%