Learning how to design automatically updating AI with Apache Kafka and Deeplearning4J.
This is the codebase that made up the bulk of development to support my talk at Strata Conference London 2018. It comes as is and should be classed as a work in progress.
Requirements
In order to run this system you require the following components:
- Kafka (I'm using the supplied Zookeeper distribution).
- MySQL
- Leiningen (for building the Clojure projects)
- Maven (for building the Java projects)
- Java 1.8 (parts of the system use Clojure 1.9 but there's no need to download this, Leiningen takes care of that for you)
Breakdown of the project
The project uses a mixture of Java (for the model creation) and Clojure (for Kafka Streams and the HTTP API). The directory structure of this project is broken down as follows:
config
- Kafka Connect configurations: one for the event_topic persistance and the other to save the training data ready for model training.
crontab
- Scheduled jobs for model creation.
db
- A schema for the MySQL database which holds information on the training and model accuracy and another table to hold the slope/intercept of a simple linear regression model.
messages
- Simple JSON messages for the cronjob to send to the event stream.
projects
- The main bulk of the coding is here in four projects: Model builds (dlj4.mlp
), Kafka Streaming applications (kafka.stream.events
and kafka.stream.prediction
) and a very basic HTTP API (prediction.http.api
)
scripts
- Shell scripts to create the required Kafka topics, environment variables and event trigger for the cron job.
slides
- Slides from the talk will be added once the talk has taken place on 24th May.
General order of build
Before you start please change the username/password values to a user on your MySQL database.
- Create a directory to store persisted events.
- Create a directory to save training data.
- Create a directory for the generated models.
- Start Zookeeper and Kafka
- Run the
create-topics.sh
to create the required topics. - Start Kafka Connect and add the two Connect configurations.
- Run
lein uberjar
on the Kafka Streaming apps and the HTTP API. - Run
mvn package
to create the model builders. - Run the streaming applications:
PROFILE=local java -jar path/to/jar/kafka-stream-events.jar
and
PROFILE=local java -jar path/to/jar/kafka-stream-prediction.jar
Event Messages
There are two types of events: training events and build events.
{"type":"command", "payload":"build_mlp"}
and
{"type":"training", "payload":"3,4,5,6"}
Training events are based on a four column piece of CSV data. If you want to accomodate others then you will need to modify the model builds and the streaming applications. Right now there's nothing dynamic but I'd like to work on that in the future.
Send all events to the event_topic
and they will be persisted and processed by the streaming app.
Predictions
Once models are built you can make predictions through Kafka by sending a message to the prediction_request_topic
and watching the results come back through the prediction_response_topic
.
JSON payloads look like this:
{"model":"mlp", "payload":"3,4,5"}
Note the fourth column is missing compared to the training data, this is the class you are predicting. The Kafka Streaming app takes care of the parsing and preparation to make a prediction.
Crontab
There are two flavors of cronjobs, the first is a direct call to the executable to create the model. Alternatively, and more desirable, is to send an event to the event_topic
and let the stream pick up the event and process it. This means the event stream is preserved via Kafka Connect and can be replayed.
Notes
This is a work in progress to prove out my thoughts for the Strata talk. The full talk will be available via O'Reilly.
There are plenty of improvements that could be done and I'll go through those in my talk..... so no emails just yet :)