FxKu / audit

Versioning approach for PostgreSQL using PL/pgSQL functions

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Audit

IMPORTANT NOTICE: I have created a new project called pgMemento and have copied the Audit repo. Please contact me, if you want to contribute and I'll send you an invitation.

Audit is a versioning approach for PostgreSQL using PL/pgSQL functions

  1. Index

  1. License

  2. About

  3. System requirements

  4. Background & References

  5. How To

  6. Future Plans

  7. Developers

  8. Contact

  9. Special thanks

  10. Disclaimer

  11. License


The scripts for Audit are open source under GNU Lesser General Public License Version 3.0. See the file LICENSE for more details.

  1. About

Audit is a bunch of PL/pgSQL scripts that enable auditing of a PostgreSQL database. I take extensive use of JSON functions to log my data. Thus version 9.3 or higher is needed.

As I'm on schemaless side with JSON one audit table is enough to save all changes of all my tables. I do not need to care a lot about the structure of my tables.

For me, the main advantage using JSON is the ability to convert table rows to one column and populate them back to sets of records without losing information on the data types. This is very handy when creating a past table state. I think the same applies to the PostgreSQL extension hstore so Audit could also be realized using hstore except JSON.

For further reading please consider that Audit has neither been tested nor benchmarked a lot by myself.

  1. System requirements

  • PostgreSQL 9.3
  1. Background & References

The auditing approach of Audit is nothing new. Define triggers to log changes in your database is a well known practice. There are other tools out there which can also be used I guess. When I started the development for Audit I wasn't aware of that there are so many solutions out there (and new ones popping up every once in while). I haven't tested any of them shameonme and can not tell you exactly how they differ from Audit.

I might have directly copied some elements of these tools, therefore I'm now referencing tools where I looked up details:

  1. How To

5.1. Add Audit to a database

Run the AUDIT_SETUP.sql script to create the schema audit with tables and functions. AUDIT_VERSIONING.sql is necessary to restore past table states and AUDIT_INDEX_SCHEMA.sql includes funtions to define constraints in the schema where the tables / schema state has been restored (in order they have been restored as tables).

5.2. Start Audit

The functions can be used to intialize auditing for single tables or a complete schema e.g.

SELECT audit.create_schema_audit('public', ARRAY['not_this_table'], ['not_that_table']);

This function creates triggers and an additional column named audit_id for all tables in the 'public' schema except for tables 'not_this_table' and 'not_that_table'. Now changes on the audited tables write information to the tables 'audit.transaction_log' and 'audit.audit_log'.

When setting up a new database I would recommend to start audit after bulk imports. Otherwise the import will be slower and several different timestamps might appear in the audit_log table. The more timestamps are recorded the more queries are necessary to restore a table state.

ATTENTION: It is important to generate a proper baseline on which a table/database versioning can reflect on. Before you beginn or continue to work with the database and change contents, define the present state as the initial versioning state by executing the procedure audit.log_table_state (or audit.log_schema_state). For each row in the audited tables another row will be written to the audit_log table telling the system that it has been 'inserted' at the timestamp the procedure has been executed.

5.3. Have a look at the logged information

For example, if you have run an UPDATE command on 'table_A' changing the value of some rows of 'column_B' to 'new_value' the following entries will appear in the log tables:

TRANSACTION_LOG

ID tx_id operation schema table_name relid timestamp user client address applicatopn
100 11111111 UPDATE public table_A 22222222 2014-05-22 15:00:00.100 felix 192.168.0.0/32 pgAdmin III

AUDIT_LOG

ID tx_id relid timestamp audit_id table_content
500 11111111 22222222 2014-05-22 15:00:00.100 70 {"column_B":"old_value"}
501 11111111 22222222 2014-05-22 15:00:00.100 71 {"column_B":"old_value"}
502 11111111 22222222 2014-05-22 15:00:00.100 72 {"column_B":"old_value"}

As you can see only the changes are logged. DELETE and TRUNCATE commands would cause logging of the complete rows while INSERTs would leave a blank field for the 'table_content' column.

5.4. Restore a past state of your database

A table state is restored with the procedure audit.restore_table_state ('timestamp_x', 'name_of_audited_table', 'public', 'name_for_target_schema', 'VIEW', 0). A whole database state might be restored with audit.restore_schema_state. The result is written to another schema specified by the user. It can be written to VIEWs (default) or TABLEs. If chosen VIEW you are able to execute the procedure again to update the target schema by passing a new timestamp. The old view(s) will be replaced. Simply specifiy the last parameter of the fuction (integer) as 1.

How does the restoring work? Well, imagine a time line like this:

1______2___3__4___5______6_7_8__9___10 [Timestamps]
I______U___D__I___U_x____U_U_I__D__now [Operations]
I = Insert, U = Update, D = Delete, x = the date I'm interested in

Let me tell you how a record of one example row I'll use in the following looked liked at date x:

TABLE_A

ID column_B column_C audit_id
1 new_value abc 70

5.4.1. Fetching audit_ids

I need to know which entries were valid at the date I've requested (x). I look up the transaction_log table to see if I could find an entry for date x and table 'table_A'. No? Ok, I've rounded up the timestamp, that's why. So dear transaction_log table, tell me what is the next older timestamp you got for 'table_A'.

Timestamp 5. An UPDATE statement happened there. Get me all the rows affected by this transaction from the audit_log table (where table_name is 'table_A'), but I'm only interested in their audit_ids.

Timestamp 4. Ah yes, an INSERT command. Again get me all the audit_ids but leave out the ones I already have or to say it in a different way: Leave out the ones that appear after this timestamp (>) until my queried date (<=).

Timestamp 3. An DELETE statement? I'm not interested in anything what happened there. The rows were deleted an do not appear at my queried date. The same would apply for TRUNCATE events.

Timestamp 2. An UPDATE again. Do the same like at timestamp 4.

Timestamp 1. OK, INSERT - Again: Get the IDs, leave out IDs that appear during the last timestamps.

5.4.2. Generate entries from JSON logs

Ok, now that I know, which entries were valid at date x let's perform the PostgreSQL function json_populate_recordset on the column 'table_content' in the 'audit_log' table using the fetched audit_ids to transform JSON back to tables. But wait, in some rows I only got this fragment {"column_B":"old_value"}. Well of course, it's just the change not the complete row. What to do?

We have to check the timestamps after date x in ascending order and collect each JSON diff to build up complete records of a past state. If some values have changed multiple times corresponding key-value pairs in newer JSON diffs are ignored as we are only interested in the oldest version.

So at timestamp 6 a first JSON diff is just fetched, e.g. {"column_B":"new_value"}.

At timestamp 7 another update happened and a JSON diff is fetched again, e.g. {"column_B":"newer_value";"column_C":"abc"}, which has to be merged into the JSON log we've already got. The result would be {"column_B":"new_value;"column_C":"abc"}.

Timestamp 8 does not appear in our timestamp query because it refers to an INSERT operation which just adds new entries with their own audit_ids.

As for timestamp 9 some rows were deleted which logs the whole content of the affected tuples (as we remember from 5.3.). Finally, the audited table itself is also queried against the audit_id because for rows that have not been deleted the last JSON diff has to be generated from the recent version of the table entry.

The final result of our long journey would be the following JSON object: {"column_B":"newer_value";"column_C":"abc";"id":1";"audit_id":70;"new_attribute":"extra"}

As you can see the ordering of columns is a little disarranged and a new column found its way to our JSON log. Both problems do not matter when performing json_populate_recordset as it uses a table template that defines the table structure. This leads to a another important aspect of Audit.

5.4.3. Table templates

If I would use 'table_A' as the template for json_populate_recordset I would receive the correct order of columns but could not exclude new columns. I need to know the structure of 'table_A' at date x, too.

Unfortunately Audit can yet not perform logging of DDL commands, like ALTER TABLE [...]. It forces the user to do a manual column backup. By executing the procedure audit.create_table_template an empty copy of the audited table is created in the audit schema (concatenated with an internal ID). Every created table is documented (with timestamp) in 'audit.table_templates' which is queried when restoring former table states.

IMPORTANT: A table template has to be created before the audited table is altered.

5.5. Work with the past state

If past states were restored as tables they do not have primary keys or indexes assigned to them. References between tables are lost as well. If the user wants to work on the restored table or database state - like he would do with the recent state - he can use the procedures audit.pkey_table_state, audit.fkey_table_state and audit.index_table_state. These procedures create primary keys, foreign keys and indexes on behalf of the recent constraints defined in the certain schema (e.g. 'public'). If table and/or database structures have changed fundamentally over time it might not be possible to recreate constraints and indexes as their metadata is not logged by Audit.

  1. Future Plans

First of all I want to to share my idea with the PostgreSQL community and discuss the scripts in order to improve them. Let me know what you think of it.

I would be very happy if there are other PostgreSQL developers out there who are interested in Audit and willing to help me to improve it. Together we might create a powerful, easy-to-use versioning approach for PostgreSQL.

However, here are some plans I have for the near future:

  • Have another table to store metadata of additional created schemas for former table / database states.
  • Develop a method to revert specific changes e.g. connected to a transaction_id, date, user etc. I've already developped procedures to merge a whole schema into another schema, e.g. to be able to do a full rollback on a database (see AUDIT_REVERT.sql). But the approach is a bit over the top if I just want to revert one transaction.
  1. Developers

Felix Kunde

  1. Contact

fkunde@virtualcitysystems.de

  1. Special Thanks

  • Hans-Jürgen Schönig (Cybertech) --> recommend to use a generic JSON auditing
  • Christophe Pettus (PGX) --> recommend to only log changes
  • Claus Nagel (virtualcitySYSTEMS) --> conceptual advices about logging
  • Ollyc (Stackoverflow) --> Query to list all foreign keys of a table
  • Denis de Bernardy (Stackoverflow, mesoconcepts) --> Query to list all indexes of a table
  1. Disclaimer

AUDIT IS PROVIDED "AS IS" AND "WITH ALL FAULTS." I MAKE NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE QUALITY, SAFETY OR SUITABILITY OF THE SKRIPTS, EITHER EXPRESSED OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT.

IN NO EVENT WILL I BE LIABLE FOR ANY INDIRECT, PUNITIVE, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES HOWEVER THEY MAY ARISE AND EVEN IF I HAVE BEEN PREVIOUSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. STILL, I WOULD FEEL SORRY.

About

Versioning approach for PostgreSQL using PL/pgSQL functions

License:GNU Lesser General Public License v3.0