scarf-sh / amazon-kinesis-client-haskell

This package provides an interface to the Amazon Kinesis Client Library (KCL) MultiLangDaemon for Haskell

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Amazon Kinesis Client Library for Haskell

This package provides an interface to the Amazon Kinesis Client Library (KCL) MultiLangDaemon for Haskell.

Developers can use the KCL to build distributed applications that process streaming data reliably at scale. The KCL takes care of many of the complex tasks associated with distributed computing, such as load balancing across multiple instances, responding to instance failures, checkpointing processed records, and reacting to changes in stream volume.

This package wraps and manages the interaction with the MultiLangDaemon, which is provided as part of the Amazon KCL for Java so that developers can focus on implementing their record processing logic.

A record processor in Haskell typically looks something like the following:

module Main where

import           Control.Exception.Safe (handleAny)
import qualified Data.Text as T
import           Network.AWS.Kinesis.Client

main :: IO ()
main =
  handleAny (kclPutStrLn . T.pack . show) $
    runKCL initialise processRecords shutdown
  where
    initialise InitialisationInput{..} =
      -- Initialize the record processor
      undefined
    processRecords ProcessRecordsInput{..} =
      -- Process a batch of records from _priRecords, and optionally checkpoint by calling
      -- _priCheckpointer
      undefined
    shutdown ShutdownInput{..} =
      -- This is called when the KCL is being shutdown, and if desired the record processor
      -- can checkpoint here by calling _siCheckpointer

For more information about Amazon Kinesis and the client libraries, see the official documentation as well as the Amazon Kinesis forums.

Getting started

Set up your AWS credentials

TODO

Building and running the sample projects

In addition to source code for the Amazon KCL for Haskell itself, this repository contains a sample application, which can serve as a starting point for your KCL application.

The sample application consists of two projects:

  • A data processor (sample-app\Main.hs)
    A new instance of this program is invoked by the MultiLangDaemon for each shard in the stream. It consumes the data from the shard. If you no longer need to work with the stream after running SampleConsumer, remember to delete both the Amazon DynamoDB checkpoint table and the Kinesis stream in your AWS account.

The following defaults are used in the sample application:

  • Stream name: my-test-stream
  • Number of shards: 1

Running the data producer

To run the data producer, run the sample-app.

Notes

Running the data processor

Because the Amazon KCL for Haskell requires the MultiLangDaemon, which is provided by the Amazon KCL for Java, a bootstrap program has been provided. This program downloads all required dependencies prior to invoking the MultiLangDaemon, which executes the processor as a subprocess.

To run the processor install the sample app, then run the bootstrap project with the following configuration:

> stack install kcl-sample-app
> stack exec kcl-bootstrap -- --properties ./sample-app/kcl.properties --execute

Notes

  • You must have Java installed.
  • If you omit the --execute argument, the bootstrap program outputs a command that can be used to start the KCL directly.
  • The MultiLangDaemon reads its configuration from the kcl.properties file, which contains a few important settings:
    • executableName = kcl-sample-app
      The name of the processor executable.
    • streamName = my-test-stream
      The name of the Kinesis stream from which to read data. This must match the stream name used by your producer.
    • More options are described in the properties file.

Cleaning up

This sample application creates a few resources in the default region of your AWS account:

  • A Kinesis stream named my-test-stream, which stores the data generated by your producer
  • A DynamoDB table named HaskellKinesisSample, which tracks the state of your processor

Each of these resources will continue to incur AWS service costs until they are deleted. After you are finished testing the sample application, you can delete these resources through the AWS Management Console.

What you should know about the MultiLangDaemon

The Amazon KCL for Haskell uses the Amazon KCL for Java internally. We have implemented a Java-based daemon, called the MultiLangDaemon, which handles all of the heavy lifting. Our approach has the daemon spawn the user-defined record processor program as a sub-process. The MultiLangDaemon communicates with this sub-process over standard input/output using a simple protocol, and therefore the record processor program can be written in any language.

At runtime, there will always be a one-to-one correspondence between a record processor, a child process, and an Amazon Kinesis shard. The MultiLangDaemon ensures that, without any developer intervention.

In this release, we have abstracted these implementation details and exposed an interface that enables you to focus on writing record processing logic in Haskell. This approach enables the Amazon KCL to be language-agnostic, while providing identical features and similar parallel processing model across all languages.

See Also

About

This package provides an interface to the Amazon Kinesis Client Library (KCL) MultiLangDaemon for Haskell

License:MIT License


Languages

Language:Haskell 100.0%