brettwooldridge / influx4j

High-performance, zero-garbage, Java client/driver for InfluxDB.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

influx4j

InfluxDB is a high-performance time-series database. You need a Java driver to match.

When jamming tens-of-thousands of metrics into InfluxDB per minute, you can't afford Stop-The-World garbage collection times reaching into the hundreds (or thousands) of milliseconds. And you can't afford to senselessly burn CPU cycles.

10x Faster.  10x Less CPU.  Infinity Less Garbage.

influx4j is wickedly fast. 10 times faster than the offical driver. influx4j is a CPU miser. 10 times less CPU consumption persisting a datapoint than the official driver. influx4j generates ZERO garbage from Point to Protocol. Infinitely less garbage than the official driver.

As measured by the JMH benchmark, included in this project, comparing influx4j with the official driver, Point-to-protocol ...

45 Second run:

Driver Points Produced
(approx.)
Points/ms
(approx.)
Garbage
Produced
Avg Garbage
Creation Rate
G1 Garbage
Collections
influx4j 192 million 4267 zero zero zero
influxdb-java 18 million 406 334.64 gb 6.17 gb/sec 766

Zero garbage means the JVM interrupts your performance critical code less.1 The extreme efficiency of the Point-to-protocol buffer serialization pipeline means you burn 10x less CPU producing the same number of points compared to the official driver.

1 Note: While influx4j generates zero garbage, your application, and associated libraries likely generate garbage that will still require collection.

Usage

🏭 Creating a PointFactory

Towards the goal of zero-garbage, influx4j employs a pooling scheme for Point instances, such that Point objects are recycled within the system. This pool is contained within a factory for producing Points: PointFactory.

The first thing your application will need to do is to configure and create a PointFactory. The only configuration options are the initial size of the pool and the maximum size of the pool.

Your application can create multiple PointFactory instances, or a singleton one; it's up to you. All methods on the PointFactory are thread-safe, so no additional synchronization is required.

A PointFactory with a default initial size of 128 Point objects and maximum size of 512 Point objects can be constructed like so:

PointFactory pointFactory = PointFactory.builder().build();

And here is a PointFactory created with custom configuration:

PointFactory pointFactory =
   PointFactory.builder()
               .initialSize(1000)
               .maximumSize(8000)
               .build();

The maximumSize should be tuned to somewhat larger than the maximum number of points generated per-second by your application. That is, assuming the default connection "auto-flush" interval of one second.

The total memory consumed by the pool will be determined by the "high water mark" of usage. Keep this in mind when setting the maximumSize. You can actually force the pool to empty by calling the flush() method, but know that doing so will therefore create garbage out of the contents.

PointFactory Behaviors

  • Your application will never "block" when creating a Point. If the internal pool is empty, a new Point object will be allocated.
  • The internal pool will never exceed the configured maximum size. If the pool is full when a Point is returned, that Point will be discarded for garbage collection. Therefore, in order to avoid garbage generation, the maximum size should be set based on your application's insertion rate and the configured auto-flush rate (see below).
  • The internal pool never shrinks. As noted above, you can completely empty the pool by calling the flush() method on the PointFactory instance, but it is not recommended.

You can obtain Points from the PointFactory that you simply throw away, without damaging the pool. For example, if your code may throw an exception after creating a Point, but before persisting it, you need not worry about recycling the Point via try-finally logic etc. Just don't make a habit of casually throwing away Points, after all, decreasing garbage is one of the goals of the library.


πŸ’  Creating a Point

Once you have a PointFactory instance, you are ready to create Point instances to persist. The Point class implements a builder-like pattern.

Example of creating a Point for a measurement named "consumerPoll123":

PointFactory pointFactory = ...

Point point = pointFactory
   .createPoint("consumerPoll123")
   .tag("fruit", "apple")
   .field("yummy", true)
   .field("score", 9.5d)
   .timestamp();

The timestamp can also be specified explicitly:

Point point = pointFactory
   .createPoint("consumerPoll123")
   .tag("fruit", "banana")
   .field("yummy", false)
   .field("score", 5.0d)
   .timestamp(submissionTS, TimeUnit.MILLISECONDS);

Note that while a TimeUnit may be specified on the Point, the ultimate precision of the persisted timestamp will be determined by the precision specified in the connection information (see below for details about connection parameters). The TimeUnit specified on the Point timestamp will automatically be converted to the precision of the connection.

Point Accessors

Point contains field() methods for the following Java types: String, Long, Double, Boolean. Tag values, as per InfluxDB specification, must be strings.

Point also contains read accessors, such as String stringField(String field), but it is important to note that influx4j is optimized for write performance, and there is overhead involved in these field accessors due to linear (O(n)) scan of the relavent field type.

If the number of fields of a given type are small, the overhead will not be too great. Also, if only some fields need to be read from a Point before insertion then you can improve performance by adding those fields first, ensuring that they will be among the first to be scanned by the linear search.

This linear scan behavior is also true of the String tag(String tagName) accessor.

If the order of fields is always consistent, you can eliminate read-accessor overhead by using the accessors that accept an integer index, such as String stringField(int index). This will access the Nth String field -- not the Nth field added to the Point; i.e. the index is specific by field-type.

Lastly, it should be noted that the read-accessors return Objects, such as Long, Double, Boolean, etc. due to the fact that the accessed field may not exist -- and therefore null must be returned. The implication, therefore, is that an auto-boxing operation must be performed by the JVM, and the associated overhead that comes with it (incl. garbage).

Note that the Point class in not involved in the querying of InfluxDB, so the above caveats for read-accessors only applies to points that will be written.

Point Copying

It is quite common to have a set of measurements which share a common set of tags, and which are produced at the same time for insertion into InfluxDB. The Point class provides a copy() method that make this more efficient, both in terms of execution time and code brevity.

Copying a Point:

Point point1 = pointFactory
   .createPoint("procStats")
   .tag("dataCenter", "Tall Pines")
   .tag("hostId", "web.223")
   .field("cpuUsage", hostCpu)
   .field("memTotal", hostMemTotal)
   .field("memFree", hostMemFree)
   .timestamp();

Point point2 = point1
   .copy("netStats")
   .field("inOctets", hostInOctets)
   .field("outOctets", hostOutOctets)

There are several important things to note about the copy() method:

  • A new measurement name is specified as a parameter to the copy() method.
  • All tags are copied. In this example, point2 will also contain the "dataCenter" and "hostId" tags from point1.
  • No field values are copied.
  • The timestamp of the source Point is copied (retained).
  • The copied point, point2 in the example above, is a Point like any other, and therefore additional tags and fields may be added, and the timestamp changed/updated via the standard methods.

πŸ”Œ Connection

An instance of InfluxDB represents a connection to the database. Similar to the PointFactory, a Builder is used to configure and create an instance of InfluxDB.

A simple example construction via the Builder is shown here:

InfluxDB influxDB = InfluxDB.builder()
         .setConnection("127.0.0.1", 8086, InfluxDB.Protocol.HTTP)
         .setUsername("mueller")
         .setPassword("gotcha")
         .setDatabase("example")
         .build();

πŸ‘‰ Note that while InfluxDB.Protocol.UDP is defined, UDP is currently not supported by the driver.

Connection Parameters

The following configuration parameters are supported by the InfluxDB.Builder:

πŸ’Ώ setDatabase(String database)

The name of the InfluxDB database that Point instances will be inserted into.

πŸ‘€ setUsername(String username)

The username used to authenticate to the InfluxDB server.

πŸ”‘ setPassword(String password)

The password used to authenticate to the InfluxDB server.

⏱️ setRetentionPolicy(String retentionPolicy)

The name of the retention policy to use.

➿ setConsistency(Consistency consistency)

The consistency setting of the connection. One of:

  • InfluxDB.Consistency.ALL
  • InfluxDB.Consistency.ANY
  • InfluxDB.Consistency.ONE
  • InfluxDB.Consistency.QUORUM.

πŸ•’ setPrecision(Precision precision)

The precision of timestamps persisted through the connection. One of:

  • InfluxDB.Precision.NANOSECOND
  • InfluxDB.Precision.MICROSECOND
  • InfluxDB.Precision.MILLISECOND
  • InfluxDB.Precision.SECOND
  • InfluxDB.Precision.MINUTE
  • InfluxDB.Precision.HOUR

🚽 setAutoFlushPeriod(long periodMs)

The auto-flush period of the connection. Point objects that are persisted via the write(Point point) method, are not written immediately, they are queued for writing asynchronously. The auto-flush period defines how often queued points are written (flushed) to the connection. The default value is one second (1000ms), and the minimum value is 100ms.

setThreadFactory(ThreadFactory threadFactory)

An optional ThreadFactory used to create the auto-flush background thread.


✏️ Writing a Point

Writing a Point is simple, there is only one method: write(Point point).

Point point = pointFactory.createPoint("survey")
         .tag("fruit", "apple")
         .field("yummy", true)
         .timestamp();

influxDB.write(point);

See the InsertionTest for example usage, until I have time to write full docs.

About

High-performance, zero-garbage, Java client/driver for InfluxDB.


Languages

Language:Java 97.5%Language:Shell 2.5%