mmolimar / khermes

A distributed fake data generator based in Akka.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Stratio Khermes. Build StatusCoverage Status

Overview.

When you have a complex system architecture with a huge variety of services, the first question that arises is: "What happens when I start to generate tons of events and what is the behaviour of the system when it starts to process them?". For these reasons we are devoloping a high configurable and scalable tool called Khermes that can answer this "a priori simple" question.

"Khermes is a distributed fake data generator used in distributed environments that produces a high volume of events in a scalable way".

It has the next features:

  • Configurable templates through Play Twirl. Make your own template and send it to one or more nodes.
  • Random event generation through Khermes helper: based in Faker, you can generate generic names, dates, numbers, etc.
  • Scalable generation through an Akka Cluster. Run up all nodes that you need to generate data.
  • A simple but powerful shell to take the control of your cluster: you can start, stop node generation in seconds.

Architecture.

The main idea behind Khermes is to run nodes that produce messages. These nodes should be increased or decreased depending on the needs of the user. For this reason we thought it could be a good idea to use an Akka cluster. An architecture can be summarized in these points:

  • Each Akka cluster node can receive messages to perform operations such as start, stop, etc. data generation. To start a node it needs three basic things:
    • A Khermes configuration. This configuration will set, for example, where the templates will compile, i18n of the data, etc.
    • A Kafka configuration. This configuration will set Kafka parameters. You can see the official Kafka documentation to get more specific information.
    • A Twirl template. A template that will define how to generat a CSV, JSON or every structure that you need.
    • All configurations can be reused thanks to persisting all of them in Zookeeper. For this reason it is mandatory to have a running instance of zookeeper in our system.

Installation and Execution.

Right now the only way to execute Khermes is to generate a jar file. To make it, you should execute:

$ mvn clean package

This command will generate a fat jar with all dependencies in target/hermes-.jar. To run it, you should execute:

$ java -jar target/hermes-<version>.jar [-Dparameters.to.overwrite]

Getting started.

The first thing that you should do is to specify a configuration. Khermes configuration is done thanks to Typesafe config. You can see all options that you can configure in the next section:

hermes {
  templates-path = "/opt/hermes/templates"
  client = false
}
akka {
  loglevel = "error"
  actor {
    provider = "akka.cluster.ClusterActorRefProvider"
    debug {
      receive = on
      lifecycle = on
    }
  }
  remote {
    log-remote-lifecycle-events = off
    netty.tcp {
      hostname = localhost
      port = 2553
    }
  }
  cluster {
    roles = [backend]
    seed-nodes = [${?VALUE}]
    auto-down-unreachable-after = 10s
  }
}

As you can see, you can set configurations for Khermes or Akka cluster. We can not see how an Akka cluster is configured because there is a lot of information in its official documentation. For Khermes, you can set the next parameters:

  • templates-path: when you send a template to one node, it sends a Twirl template. The template is translated to a Scala native code that should be compiled when it runs the first time. For this reason you need to set a temporal path where all .scala and .class files are.
  • client: if you need to start a node and you also need a shell you can put the value of this parameter to true. When you run you will see a Khermes shell, something like:
╦╔═┬ ┬┌─┐┬─┐┌┬┐┌─┐┌─┐
╠╩╗├─┤├┤ ├┬┘│││├┤ └─┐
╩ ╩┴ ┴└─┘┴└─┴ ┴└─┘└─┘ Powered by Stratio (www.stratio.com)

> System Name   : khermes
> Start time    : Fri Mar 10 12:31:52 CET 2017
> Number of CPUs: 8
> Total memory  : 251658240
> Free memory   : 225155304
    
khermes>

If you execute help in your command line you can see the list of available commands in our shell:

khermes> help
Khermes commands:
  set khermes             Sets your Khermes configuration
  set kafka              Sets your Kafka configuration
  set template           Sets your template
  set avro               Sets your Avro configuration
  show config            Show all set configurations
  ls                     Lists the nodes with their current status
  start <node-id>        Starts event generation in node with id <node-id>
  stop <node-id>         Stops event generation in node with id <node-id>
  clear                  Cleans the screen.
  help                   Shows this help.
  exit                   Exit of Khermes Cli.

Steps to run a policy:

  • Step 1) Save a Khermes configuration that will be persisted in Zookeeper. This is needed because otherwise, the next time that the user executes Khermes it will lost this configuration:
  khermes> set khermes
  Press Control + D to finish
  khermes {
     templates-path = "/tmp/khermes/templates"
     topic = "test"
     template-name = "testTemplate"
     i18n = "ES"
     timeout-rules {
       number-of-events: 1000
       duration: 2 seconds
     }
     stop-rules {
       number-of-events: 5000
     }
  }

As you can see you should configure the following variables: - templates-path: in every node that you send this configuration, it will need to generate and compile a template. - topic: it indicates a Kafka topic where messages will be produced. - template-name: it indicates a prefix for the generated .scala and .class files. It is possible that in the future this variable dissapears. - i18n: internationalization of Khermes helper. It generates, for example names in Spanish. Right now only ES and EN are available. - timeout-rules: it is optional. When it is set it generates 1000 events and wait 2 seconds to generate the next 1000 events. - stop-rules: it is optional. When it is set it generates 5000 events and the node stops data generation. Besides the node will be free to accept more requests.

  • Step 2) Save a Kafka configuration that will also be persisted in Zookeeper.
  khermes> set kafka
  Press Control + D to finish
  kafka {
     bootstrap.servers = "localhost:9092"
     acks = "-1"
     key.serializer = "org.apache.kafka.common.serialization.StringSerializer"
     value.serializer = "org.apache.kafka.common.serialization.StringSerializer"
  }
  • Step 3) Save a Twirl template that will also be persisted in Zookeeper.
  khermes> set template
  Press Control + D to finish
  @import com.stratio.hermes.utils.Hermes
  @(khermes: Khermes)
  {
    "name" : "@(Khermes.Name.firstName)"
  }
  • Step four) Once you have saved these configurations in ZK, you can start a generation in the nodes that you need:
  khermes> ls
  Node Id                                Status
  845441ec-cb0d-4363-b494-a39d56a82727 | false
  khermes> start 845441ec-cb0d-4363-b494-a39d56a82727
  khermes> ls
  Node Id                                Status
  845441ec-cb0d-4363-b494-a39d56a82727 | true

At this moment the node with id 845441ec-cb0d-4363-b494-a39d56a82727 is producing messages to Kafka following the saved template. You can check it using Kafka console consumer.

Random Helper.

Based on Faker we are developing a random generator. At this moment we have the next features:

  • Name generation:
  fullname() → Paul Brown
  middleName() → George Michael
  firstName() → Steven
  lastName() → Robinson
  • Number generation:
  number(2) → 23
  number(2,Positive) → 23
  decimal(2) → 23.45
  decimal(2,Negative) → -45.89
  decimal(2,4) → 45.7568
  decimal(3,2,Positive) → 354.89
  numberInRange(1,9) → 2
  decimalInRange(1,9) → 2.6034840849740117
  • Geolocation generation:
  geolocation() → (40.493556, -3.566764, Madrid)
  geolocationWithoutCity() → (28.452717, -13.863761)
  city() → Tenerife
  country() → ES
  • Timestamp generation:
  dateTime("1970-1-12" ,"2017-1-1") → 2005-03-01T20:34:30.000+01:00
  time() → 15:30:00.000+01:00
  • Music generation:
  playedSong() → {"song": "Shape of You", "artist": "Ed Sheeran", "album": "Shape of You", "genre": "Pop"}

Docker.

  • Seed + Node
  docker run -dit --name SEED_NAME -e PARAMS="-Dhermes.client=true -Dakka.remote.hostname=SEED_NAME.DOMAIN -Dakka.remote.netty.tcp.port=2552 -Dakka.remote.netty.tcp.hostname=SEED_NAME.DOMAIN -Dakka.cluster.seed-nodes.0=akka.tcp://hermes@SEED_NAME.DOMAIN:2552" qa.stratio.com/stratio/hermes:VERSION
  • Node
  docker run -dit --name AGENT_NAME -e PARAMS="-Dhermes.client=false -Dakka.remote.hostname=AGENT_NAME.DOMAIN -Dakka.remote.netty.tcp.port=2553 -Dakka.cluster.seed-nodes.0=akka.tcp://hermes@SEED_NAME.DOMAIN:2552" qa.stratio.com/stratio/hermes:VERSION

FAQ.

  • Is Zookeeper needed to run Khermes?. Yes, at this moment it is mandatory to have an instance of Zookeper in order to run Khermes.
  • Is Apache Kafka needed to run Khermes?. Yes, at the end all generated event will be persisted in Kafka and right now there are not any other possibility.
  • Is there any throughput limitation?. No, Khermes is designed to scale out of the box adding infinite nodes in an Akka cluster.

Roadmap.

  • Awesome UI.
  • No Zookeeper dependency using Akka Distributed Data.

Licenses.

Licensed under the Apache License, Version 2.0

Tech.

Khermes uses a number of open source projects to work properly:

  • Twirl - Twirl is the Play template engine.
  • Akka - Akka is a toolkit and runtime for building highly concurrent, distributed, and resilient message-driven applications on the JVM.
  • Apache Kafka - Kafka™ is used for building real-time data pipelines and streaming apps.

And of course it is open source itself with a repository on GitHub Khermes

Development.

Want to contribute? Great! Khermes is open source and we need you to keep growing.

Contributors.

About

A distributed fake data generator based in Akka.


Languages

Language:Scala 94.1%Language:HTML 4.5%Language:JavaScript 0.9%Language:Shell 0.4%