yildizib / elasticsearch-cardinality-plugin

This plugin extends Elasticsearch providing a new type of aggregation and a REST action to estimate the cardinality (number of uniq terms) of a field.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Elasticsearch Cardinality Plugin

This plugin extends Elasticsearch providing a fast & memory-efficient way to estimate the cardinality (number of uniq terms) of a field. The field can be either string, numerical or boolean. The plugin registers a new type of aggregation (cardinality) and a REST action (_cardinality).

We love pull-requests!

Prerequisites:

  • Elasticsearch 1.0.0+

Binaries

  • Compiled versions of the plugin are stored in the dist directory.

Principle

This plugin uses the HyperloglogPlus algorithm provided by the Stream-lib library to estimate the cardinality (uniq term count) of a field. Basically, it estimates the number of uniq values of a field without loading all of them into RAM. The merge between shards and between indices is supported (and efficient).

Without such plugin, the only way to count the uniq number of values in a field was to retrieve all values on the client-side and to count the length of the resulting array (Totally inefficient).

REST Action

To estimate the cardinality of a field, use the following REST action:

curl -XGET http://localhost:9200/{index}/{field}/_cardinality

For example, to estimate the number of uniq IPs in the index logstash-2014.02.03:

curl -XGET http://localhost:9200/logstash-2014.02.03/ip/_cardinality
{
	"_shards": {
		"total": 2,
		"successful": 2,
		"failed": 0
	},
	"count": 46367
}

To estimate the number of uniq IPs in several indices:

curl -XGET http://localhost:9200/logstash-2014.01.*/ip/_cardinality
{
	"_shards": {
		"total": 86,
		"successful": 86,
		"failed": 0
	},
	"count": 919979
}

Aggregation

To build an aggregation estimating the cardinality of a field, use the following code:

{
  "aggregations": {
    "<aggregation_name>": {
      "cardinality": {
        "field": "<field_name>"
      }
    }
  }
}

For example, to estimate the number of uniq IPs in a result set, use the following code:

{
  "aggregations": {
    "uniq_ips": {
      "cardinality": {
        "field": "ip"
      }
    }
  }
}
{
  "aggregations": {
    "uniq_ips": {
      "value": 42
    }
  }
}

Setup

Installation

./plugin --url elasticsearch-cardinality-plugin-0.0.1.zip --install index-cardinality

Uninstallation

./plugin --remove index-cardinality

About

This plugin extends Elasticsearch providing a new type of aggregation and a REST action to estimate the cardinality (number of uniq terms) of a field.

License:Apache License 2.0