k4y3ff / zombodb

Making Postgres and Elasticsearch work together like it's 2015

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ZomboDB is a Postgres extension that enables efficient full-text searching via the use of indexes
backed by Elasticsearch. In order to achieve this, ZomboDB implements Postgres' Access Method API.

In practical terms, a ZomboDB index doesn't appear to Postgres any different than a standard btree index might. As such, standard SQL commands for mutating data are fully supported: INSERT, UPDATE, DELETE, COPY, and VACUUM.

Behind the scenes, ZomboDB indexes communicate with Elasticsearch via HTTP and are automatically synchronized in an MVCC-safe manner as data in the underlying Postgres table changes.

Following Postgres' MVCC rules means that transactions see, at all times, a view of the backing Elasticsearch index that is consistent with their view of Postgres tables.

Index management happens using standard Postgres SQL commands such as CREATE INDEX, REINDEX, and ALTER INDEX. Searching uses standard SQL SELECT statements with a custom operator that exposes a full-text query language supporting most of Elasticsearch's query constructs.

Quick Links

Features

  • transaction-safe full-text queries
  • managed & used via standard Postgres SQL
  • works with tables of any structure
  • automatically creates Elasticsearch Mappings supporting most datatypes, including arrays
  • json columns as nested objects for flexible schemaless sub-documents
  • custom full-text query language supporting nearly all of Elasticsearch's search features, including
    • boolean operations
    • proximity (in and out of order)
    • phrases
    • wildcards
    • fuzzy terms/phrases
    • "more like this"
    • regular expressions
    • inline scripts
    • range queries
  • query results expansion and index linking
  • extremely fast indexing
  • record count estimation
  • high-performance hit highlighting
  • access to many of Elasticsearch's aggregations, including ability to nest aggregations
  • use whatever method you currently use for talking to Postgres (JDBC, DBI, libpq, etc)
  • fairly extensive test suite

Not to suggest that these things are impossible, but there's a small set of non-features too:

  • no scoring
  • indexes are not WAL-logged by Postgres so are not recoverable in the event of a Postgres server crash
  • interoperability with various Postgres replication schemes is unknown
  • pg_get_indexdef() doesn't correctly quote index options making backup restoration annoying (would require patch to Postgres)
  • Postgres HOT updates not supported
  • only supports Postgres query plans that choose IndexScans or BitmapIndexScans (the latter is also dependent on sufficient work_mem to avoid Recheck conditions)

History

The name is an homage to zombo.com and its long history of continuous self-affirmation.

Development began in 2013 by Technology Concepts & Design, Inc as a closed-source effort to provide transaction safe full-text searching on top of Postgres tables. While Postgres' "tsearch" features are useful, they're not necessarily adequate for 200 column-wide tables with 100M rows, each containing large text content.

Initially designed on-top of Postgres' Foreign Data Wrapper API, ZomboDB quickly evolved into an index type (Access Method) so that queries are MVCC-safe and standard SQL can be used to query and manage indexes.

Elasticsearch was chosen as the backing search index because of its horizontal scaling abilities, performance, and general ease of use.

Two years later, it's been used in production systems for quite some time and is now open-source.

How to Use It

Usage is really quite simple. Note that this is just a brief overview. See the various documentation files for more detailed information.

Install the extension:

CREATE EXTENSION zombodb;

Create a table:

CREATE TABLE products (
    id SERIAL8 NOT NULL PRIMARY KEY,
    name text NOT NULL,
    keywords varchar(64)[],
    short_summary phrase,
    long_description fulltext, 
    price bigint,
    inventory_count integer,
    discontinued boolean default false,
    availability_date date
);
-- insert some data

Index it:

CREATE INDEX idx_zdb_products 
                     ON products 
                  USING zombodb(zdb(products)) 
                   WITH (url='http://localhost:9200/', shards=5, replicas=1);

Query it:

SELECT * FROM products WHERE zdb(products) ==> 'keywords:(sports,box) or long_description:(wooden w/5 away) and price < 100000';

What you need

Product Version
Postgres 9.3
Elasticsearch 1.7.1+ (not 2.0)
Java JDK 1.7.0_51+

For information about how to develop/build ZomboDB, see the Development Guide.

Credit and Thanks

Credit goes to Technology Concepts & Design, Inc, its management, and its development and quality assurance teams not only for their work during the early development days but also for their on-going support now that ZomboDB is open-source.

Contact Information

License

Portions Copyright 2013-2015 Technology Concepts & Design, Inc.
Portions Copyright 2015 ZomboDB, LLC

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

Making Postgres and Elasticsearch work together like it's 2015

License:Other


Languages

Language:Java 68.2%Language:C 23.6%Language:PLpgSQL 5.5%Language:Shell 1.0%Language:C++ 0.7%Language:SQLPL 0.5%Language:Makefile 0.4%Language:CMake 0.1%