Crayfish: Navigating the Labyrinth of Machine Learning Inference in Stream Processing Systems

Project Structure

.
├── core                  # Crayfish Java core components and abstractions.
├── crayfish-java         # Crayfish adapters (i.e., FLink, Spark Structured Streaming, Kafka Streams).
├── docker                # Scripts and configuration for Docker deployment.
├── experiments-driver    # Experiments testbed and configurations.
├── input-producer        # Input Producer Component. Contains a random input generator.
├── output-consumer       # Output Consumer Component. Writes latency measurements to persistent storage.
├── rayfish               # Crayfish Ray Adapter.
├── resources             # Pre-trained models and training scripts.  
└── results-analysis      # Notebooks to analyze the results.

Supported Tools

Pre-trained Models

		FNN	ResNet50
Input Size		28x28	224x224x3
Output Size		10x1	1000x1
Parameters Number		28K	23M
Model Size	ONNX	113 KB	97 MB
	Torch	115 KB	98 MB
	H5	133 KB	98 MB
	SavedModel	508 KB	101 MB

Stream Processors

Apache Flink 1.15.2
Apache Kafka Streams 3.2.3
Spark Structured Streaming 3.3.2
Ray 2.4

Embedded Serving Frameworks

ONNX 1.12.1
DeepLearning4j 1.0.0-M2.1
TensorFlow Java (SavedModel) 1.13.1

External Serving Frameworks

TorchServe 0.7.1
TensorFlow Serving 2.11.1

Quick Start

Environment

Unix-like environment
Python 3
Maven
Java 8
Docker installation

Experiments

We offer an option to test the tools locally before deploying to a cluster.

# train ffnn/resnet50 models in different model formats
./resources/train-models.sh

# build images with the trained models
./docker/docker-build.sh

# run all experiments locally (to run a single experiment, see the options below)
./run.sh -e a -ec s -sp a -m a -msm a -msx a -em l

# clean all the running and exited containers and created volumes if you quit before the experiment finishes
./docker/docker-stop.sh

# clean log, results, and models trained by ./resources/train-models.sh
./clean-exp-files.sh

run.sh has the following options:

Arguments:
                                       [!] NOTE: Configs in 'experiments-driver/configs/exp-configs/[-ec]/[-e]' will be run.
-e     | --experiments                 Independent variable sets to run: a=all, i=input rate, b=batch size, s=scalability, r=bursty rate.
-ec    | --experiments-control         Controlled variable sets to run: s=small, l=large, d=debug.
                                         - small: Run input rate 256 for the scalability experiment.
                                           [!] NOTE: ResNet50 is recommended for this option due to the large
                                                     model size and limited memory.
                                         - large: Run input rate 30000 for the scalability experiment.
                                         - debug: Run simple experiment configs in the debug folder.
-sp    | --stream-processors           Stream processor to test:
                                         a=all, f=Apache Flink, k=Kafka Streams, s=Spark Streaming, r=Ray.
-m     | --models                      Served models: a=all, f=ffnn, r=resnet50.
-msm   | --embedded-model-servers      Embedded model serving alternative:
                                         x=none, a=all (w/o noop and nd4j), n=nd4j, d=dl4j, o=onnx, t=tf-savedmodel, k=noop.
                                         [!] NOTE: noop will execute input rate and batch size experiments.
-msx   | --external-model-servers      External model serving alternative: x=none, a=all, t=tf-serving, s=torchserve.
-em    | --execution-mode              Execution mode: l=local, c=cluster.
-d     | --default-configs             Print default configs.
-h     | --help                        Help.

Run on Cluster

To adapt the experiments to be executed on the cluster,

Build the required docker images on different VMs and train models (commands are from ./docker/docker-build.sh)

Kafka consumer VM

docker compose build output-consumer --build-arg USER_ID=$(id -u) --build-arg GROUP_ID=$(id -g)

Data processor

./resources/train-models.sh
docker compose build data-processor --build-arg USER_ID=$(id -u) --build-arg GROUP_ID=$(id -g)

Kafka producer VM

docker compose build input-producer --build-arg USER_ID=$(id -u) --build-arg GROUP_ID=$(id -g)

External serving VM

./resources/train-models.sh
docker compose build external-serving-agent --build-arg USER_ID=$(id -u) --build-arg GROUP_ID=$(id -g)
docker compose build torch-serving
docker compose build tf-serving
docker compose build ray-serving

Open the respective docker container on different VMs manually. The order matters. (commands are from ./run.sh)

Kafka producer VM

docker compose run -d --service-ports --use-aliases -u $(id -u):$(id -g) input-producer \
    /bin/bash -c "./experiments-driver/scripts/start-daemon.sh -p kp -em 'l' --port $PORT_KP >logs/$EXP_UNIQUE_ID-kp-daemon-logs.txt 2>&1"

Data processor

docker compose run -d --service-ports --use-aliases -u $(id -u):$(id -g) data-processor \
    /bin/bash -c "./experiments-driver/scripts/start-daemon.sh -p dp -em 'l' --port $PORT_DP >logs/$EXP_UNIQUE_ID-dpd-daemon-logs.txt 2>&1"

External serving VM

docker compose run -u $(id -u):$(id -g) output-consumer \
    /bin/bash -c "./experiments-driver/scripts/start-consumer.sh \
    -e $EXPERIMENTS_OPT \
    -ec $EXPERIMENTS_CTRL_OPT \
    -sp $STREAM_PROCESSOR_OPT \
    -m $MODELS_OPT \
    -msm $EMBEDDED_MODEL_SERVERS_OPT \
    -msx $EXTERNAL_MODEL_SERVERS_OPT \
    -em $EXECUTION_MODE_OPT \
    -eid $EXP_UNIQUE_ID"

Kafka consumer VM

docker compose run -u $(id -u):$(id -g) output-consumer \
    /bin/bash -c "./experiments-driver/scripts/start-consumer.sh \
    -e $EXPERIMENTS_OPT \
    -ec $EXPERIMENTS_CTRL_OPT \
    -sp $STREAM_PROCESSOR_OPT \
    -m $MODELS_OPT \
    -msm $EMBEDDED_MODEL_SERVERS_OPT \
    -msx $EXTERNAL_MODEL_SERVERS_OPT \
    -em $EXECUTION_MODE_OPT \
    -eid $EXP_UNIQUE_ID"

Update the IP addresses for each VM in experiments-driver/configs/global-configs-cluster.properties

Kafka Overhead Experiments

The experiments measuring the overhead introduced by Kafka in the pipeline do not use the Crayfish pipeline, as they employ a standalone FLink implementation. To run these experiments, the following script can be used:

./experiments-driver/run-standalone.sh -em l -ec l

run-standalone.sh has the following options:

Arguments:
-ec    | --experiments-control         Controlled variable sets to run: s=small, l=large, d=debug.
                                         - small: Run input rate 256 for the scalability experiment.
                                           [!] NOTE: ResNet50 is recommended for this option due to the large
                                                     model size and limited memory.
                                         - large: Run input rate 30000 for the scalability experiment.
                                         - debug: Run simple experiment configs in the debug folder.
-em    | --execution-mode              Execution mode: l=local, c=cluster.

Extending Crayfish

Crayfish provides a set of interfaces that allow developers to extend the benchmarking frameworks with other stream processing systems, model serving tools, and pre-trained models. We showcase examples of how to do so for each type of system.

New Stream Processors

New stream processors can be added through abstractions similar to adapters. These adapters need to extend the Crayfish interface and provide functionalities for input reading, model scoring, and output writing alongside logic to set the parallelism of the computation. The adapter needs to provide also abstractions for a stream builder, the generic operator type, and the sink operator. When using the adapter, the model type can be chosen from the list of models supported out of the box or can be a custom model defined by the Crayfish user.

The adapter should extend the Crayfish abstract class, as below. Next, implement the required methods for building your streaming application.

public class MyCrayfishAdapter extends Crayfish<MyStreamBuilder, MyOperatorType, MySinkType, MyModel> {
    // Implement required methods
    // ...

    @Override
    public MyStreamBuilder streamBuilder() {
        // Implement the logic to create and return the stream builder specific to the stream processor
    }

    @Override
    public MyOperatorType inputOp(MyStreamBuilder streamBuilder) {
        // Implement the logic to create and return a source operator
    }

    @Override
    public MyOperatorType embeddedScoringOp(MyStreamBuilder sb, MyOperatorType input) throws Exception {
        // Implement the logic for embedded ML serving
    }

    @Override
    public MyOperatorType externalScoringOp(MyStreamBuilder sb, MyOperatorType input) throws Exception {
        // Implement the logic for external ML serving
    }

    @Override
    public CrayfishUtils.Either<MyOperatorType, MySinkType> outputOp(MyStreamBuilder sb,
                                                                     MyOperatorType output) throws Exception {
        // Implement the logic for creating and returning output. Depending on the stream processor intrinsics,
        // return either the sink or another abstraction. This will be used in the start() method to start the
        // streaming application.
    }

    @Override
    public void start(MyStreamBuilder sb, Properties metaData,
                      CrayfishUtils.Either<MyOperatorType, MySinkType> out) throws Exception {
        // Implement the logic to start the processing
    }

    @Override
    public boolean hasOperatorParallelism() {
        // Return true if operator parallelism is supported
    }

    @Override
    public CrayfishUtils.Either<MyOperatorType, MySinkType> setOperatorParallelism(
            CrayfishUtils.Either<MyOperatorType, MySinkType> operator, int parallelism) throws Exception {
        // Implement the logic to set operator parallelism
        // Return null if not supported
    }

    @Override
    public void setDefaultParallelism(MyStreamBuilder sb, Properties metaData, int parallelism) throws Exception {
        // Implement the logic to set default parallelism
    }

    @Override
    public Properties addMetadata() throws Exception {
        // Implement the logic to add metadata. Some stream processing systems require extra configurations to be
        // set via Properties.
        // Return null if not required.
    }
}

The users can use the new adapter, for instance, to serve DeepLearning4J models, for instance, as follows:

Crayfish adapter = new MyCrayfishAdapter<DL4JModel>(DL4JModel.class, config);
adapter.run();

The adapters for Apache Flink, Kafka Streams, and Spark Structured Streaming can be found under crayfish-java/adapters/.

New Model Serving Tools

Crayfish can be extended to support ML serving tools besides the ones included off-the-shelf. To do so, create a new Java class and extend CrayfishModel. Then, implement the required methods as below.

public class MyModel extends CrayfishModel {

    @Override
    public void loadModel(String modelName, String location) throws Exception {
        // Implement the logic to load the model
        // Use modelName and location to load the model from the specified location
    }

    @Override
    public void build() throws Exception {
        // Implement the logic to build the model
        // This method is called after loading the model
    }

    @Override
    public CrayfishPrediction apply(CrayfishInputData input) throws Exception {
        // Implement the logic to apply the model to the input data
        // Use the input data to make predictions
    }
}

The users can use the new model, for instance, in conjunction with Spark Structured Streaming, for instance, as follows:

Crayfish adapter = new SparkSSCrayfishAdapter<MyModel>(MyModel.class, config);
adapter.run();

The implementations corresponding to the supported models can be found under core/src/main/java/datatypes/models.

Note that there is no distinction between embedded and external tools; external tools need to send requests to the external serving instance to perform the inference. To do so, Crayfish provides the helper class InferenceRequest to perform HTTP requests to a given input. The class can be extended to customize gRPC requests as well. Examples can be found under core/src/main/java/request/.

New Pre-trained ML Models

Crayfish supports two models out of the box: FFNN and ResNet50. However, the users can configure the framework to serve other models of choice. To do so, simply create a new directory containing model configurations under experiments-driver/configs/model-configs. Assuming the node model is called new_model, create a file model-config.yaml inside experiments-driver/configs/model-configs/new_model and provide the following information about the model:

model.name: new_model
input.shape: 1, 1   # the input shape of new_model
input.name: input_1  # the name of the input for the model

model.path.dl4j: path/to/dl4j/new_model
model.path.onnx: path/to/onnx/new_model
model.path.tf-savedmodel: path/to/tf-savedmodel/new_model
model.path.torch_jit: path/to/torch/new_model
model.path.torchserve: torchserve:endpoint/newmodel
model.path.tf-serving: tfserving:endpoint/newmodel

These configurations will be passed to the adapter corresponding to the chosen stream processor.

soniahorchidan / crayfish23