fredemmott / jp

A simple multi-language job pool system

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Overview

jp (Job Pool) is a producer/consumer job pooling system; I’m calling it a job pool rather than a job queue, because it *does not enforce strict ordering*.

A few of the basic principles behind this project:

  • A producer/consumer system

  • Easily usable from multiple languages

  • If a consumer starts working on a message in the pool, then crashes, eventually, without human interaction, the message will eventually be given to another consumer - jp uses a lock-work-purge system, with a timeout on the lock

  • All messages must be persistent

  • It’s not required that messages are consumed in /exactly/ the order they are added to the pool

  • It must be easy to setup an HA installation

  • It must be horizontally scalable

Supported languages

jp uses Thrift for communication with consumers and producers, and should work with any of the languages Thrift supports. At the time of writing, this includes:

  • C++

  • Java

  • Python

  • PHP

  • Ruby

  • Erlang

  • Perl

  • Haskell

  • C#

  • Cocoa

  • Smalltalk

  • Ocaml

Consumers and producers can be written in different languages.

Additionally, as an alternative to using the Thrift interface directly, simplified client libraries are currently provided for:

  • C++

  • Ruby

Requirements

  • Ruby 1.9

  • Thrift (with Ruby support)

  • mongo rubygem

  • bson rubygem

  • rev rubygem

  • A mongodb server/cluster to connect to

  • Ant (simple java client build only)

You might additionally want the bson_ext rubygem, which normally increases performance - however, it appears to be incompatible with Ruby 1.9.0 on Lenny.

We strongly recommend using Ruby 1.9.1 or later - this is also available for Lenny via backports.

Additional requirements for the C++ simple library and examples

  • libevent (and development headers - normally libevent-dev or similar)

Additional requirements for the Java simple library and examples

  • ant

Message lifecycle

  • You can prevent the locked -> unlocked transition from occuring by using a consumer-side timeout

  • Without a consumer-side timeout, there is a possibility that multiple consumers will be processing the same message at the same time

Restrictions

  • Messages must be idempotent (i.e. it doesn’t matter how many times they are consumed, as long as it’s at least once - may be twice, may be hundreds). jp will run them only once in the normal case, but there are other possibilities.

  • Unless you only run one consumer per pool, or you implement your own locking, it must be safe for multiple consumers to process the same message at the same time.

  • Message delivery order can’t matter. In some circumstances, you might be able to work around this with a ‘version’ field serialized into the message (if later messages do not depend upon previous messages having been acted on).

Serialization

jp simply passes around BLOBs as messages; it’s entirely up to your application what serialization format you use - here’s a few suggestions that are fairly portable between different languages, though you can use any other method you like:

  • Thrift via memory buffer, and your choice of protocol - the Ruby library provides a Thrift::Serialize class using the memory buffer transport and the binary protocol

  • UTF-8 if it’s simple character data

  • UTF-8 JSON

  • XML

  • UTF-8 YAML

  • Google’s Protocol Buffers

If you’re willing to be locked into using the same language for producers and consumers, you can, of course use platform-specific functionality such as:

  • serialize/unserialize in PHP

  • Marshall in Ruby

  • Data::Dumper in Perl

  • writeObject/readObject (from Serializable) in Java

It is, however, generally preferable to have more flexibility for the future by using a serialization method from the first list, or similar.

Reasons for out-of-order messages

There are (at least) three ways messages can be consumed out of order:

  • A consumer fails to complete processing within a per-pool time limit, so the message is re-queued, and may then be executed after other items that were queued while it was being processed

  • Two items end up being inserted on different mongodb servers, in the same second (then, within that second, the ordering is decided by a byte-by-byte comparison of the mongodb-generated ‘_id’ field)

  • Two consumers are running against the same queue, and one runs somewhat faster than another; depending on what your consumers do, this may lead to out-of-order execution

The first of these is unavoidable if multiple concurrent consumers are allowed per pool.

Getting started

Follow the instructions on the mongodb website, or, for a quick start, download one of the tarballs from www.mongodb.org/downloads, then:

tar xfv /path/to/mongodb-OS-arch-version.tgz
cd mongodb-OS-arch-version
mkdir /var/tmp/mongodata
bin/mongod --dbpath /var/tmp/mongodata

There are also packages for various distributions linked from the bottom of that page; use them in preference to the version included in your distribution. For example, the ‘mongodb’ version in Ubuntu at the time of writing doesn’t support the find_and_modify operation (too old), however mongodb-stable from the repositories listed on the mongodb site does.

Build/generate the Thrift bindings

To build all the generated components, run:

make

Run jp

./jp examples/jp-config.rb

This starts jp using the configuration file in the examples directory. This:

  • connects to a mongodb server on localhost

  • sets up ‘text’, ‘json’, and ‘thrift’ pools for the examples

Examples

Look at the contents of examples/ for an idea of how to use this - to build them, and their dependencies, run ‘make examples’ from the top-most directory.

Producer and consumer examples using the Thrift API are available for:

  • C++

  • Java

  • Ruby

Producer and consumer examples using the simplified libraries are available for:

  • C++

  • Ruby

  • Java

About

A simple multi-language job pool system

License:Other


Languages

Language:Ruby 37.5%Language:Java 34.1%Language:PHP 19.3%Language:C++ 9.0%