Helm Chart for an Apache Pulsar Cluster

This Helm chart configures an Apache Pulsar cluster. It is designed for production use, but can also be used in local development environments with the proper settings.

It includes support for:

TLS
Authentication
WebSocket Proxy
Standalone Functions Workers
Pulsar IO Connectors
Tiered Storage
Pulsar SQL Workers
Independent Image Versions for Components (Zookeeper, Bookkeeper, etc), enabling controlled upgrades
Pulsar Express Web UI for managing the cluster

Helm must be installed and initialized to use the chart. Both Helm 2 and Helm 3 are supported. Please refer to Helm's documentation to get started.

Don't want to run it yourself? Go to Kesque for fully managed Pulsar services.

Add to local Helm repository

To add this chart to your local Helm repository:

helm repo add kafkaesque https://helm.kafkaesque.io

To update to the latest chart:

helm repo update

Note: This command updates all your Helm charts.

To list the version of the chart in the local Helm repository:

Helm 2

helm search -l kafkaesque/pulsar

Helm 3

helm search repo kafkaesque/pulsar

Installing Pulsar in a Cloud Provider

Before you can install the chart, you need to configure the storage class settings for your cloud provider. The handling of storage varies from cloud provider to cloud provider.

Create a new file called storage_values.yaml for the storage class settings. To use an existing storage class (including the default one) set this value:

default_storage:
  existingStorageClassName: default or <name of storage class>

For each volume of each component (Zookeeper, Bookkeeper), you can override the default_storage setting by specifying a different existingStorageClassName. This allows you to match the optimum storage type to the volume.

If you have specific storage class requirement, for example fixed IOPS disks in AWS, you can have the chart configure the storage classes for you. Here are examples from the cloud providers:

# For AWS
# default_storage:
#  provisioner: kubernetes.io/aws-ebs
#  type: gp2
#  fsType: ext4
#  extraParams:
#     iopsPerGB: "10"


# For GCP
# default_storage:
#   provisioner: kubernetes.io/gce-pd
#   type: pd-ssd
#   fsType: ext4
#   extraParams:
#      replication-type: none

# For Azure
# default_storage:
#   provisioner: kubernetes.io/azure-disk
#   fsType: ext4
#   type: managed-premium
#   extraParams:
#     storageaccounttype: Premium_LRS
#     kind: Managed
#     cachingmode: ReadOnly

See the values file for more details on these settings.

Once you have your storage settings in the values file, install the chart like this :

Helm 2

helm install kafkaesque/pulsar --name pulsar --namespace pulsar --values storage_values.yaml

Helm 3

helm install pulsar kafkaesque/pulsar --namespace pulsar --values storage_values.yaml

Note: For Helm 3 you need to create the namespace prior to running the command.

Installing Pulsar for development

This chart is designed for production use, but it can be used in development enviroments. To use this chart in a development environment (ex minikube), you need to:

Disable anti-affinity rules that ensure components run on different nodes
Reduce resource requirements
Disable persistence (configuration and messages are not stored so are lost on restart). If you want persistence, you will have to configure storage settings that are compatible with your development enviroment as described above.

For an example set of values, download this values file. Use that values file or one like it to start the cluster:

Helm 2

helm install kafkaesque/pulsar --name pulsar --namespace pulsar --values dev-values.yaml

Helm 3

helm install pulsar kafkaesque/pulsar --namespace pulsar --values dev-values.yaml

Note: For Helm 3 you need to create the namespace prior to running the command.

Accessing the Pulsar cluster in cloud

The default values will create a ClusterIP for all components. ClusterIPs are only accessible within the Kubernetes cluster. The easiest way to work with Pulsar is to log into the bastion host (assuming it is in the pulsar namespace):

kubectl exec $(kubectl get pods -l component=bastion -o jsonpath="{.items[*].metadata.name}" -n pulsar) -it -n pulsar -- /bin/bash

Once you are logged into the bastion, you can run Pulsar admin commands:

bin/pulsar-admin tenants list

For external access, you can use a load balancer. Here is an example set of values to use for load balancer on the proxy:

proxy:
 service:
    type: LoadBalancer
    ports:
    - name: http
      port: 8080
      protocol: TCP
    - name: pulsar
      port: 6650
      protocol: TCP

If you are using a load balancer on the proxy, you can find the IP address using:

kubectl get service -n pulsar

Accessing the Pulsar cluster on localhost

To port forward the proxy admin and Pulsar ports to your local machine:

kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l component=proxy -o jsonpath='{.items[0].metadata.name}') 8080:8080

kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l component=proxy -o jsonpath='{.items[0].metadata.name}') 6650:6650

Or if you would rather go directly to the broker:

kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l component=broker -o jsonpath='{.items[0].metadata.name}') 8080:8080

kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l component=broker -o jsonpath='{.items[0].metadata.name}') 6650:6650

Managing Pulsar using Pulsar Express

Pulsar Express is an open-source Web UI for managing Pulsar clusters. Thanks to (Bruno Bonnin)[https://twitter.com/bruno_b] for creating this handy tool.

You can install Pulsar Express in your cluster by enabling with this values setting:

extra:
  pulsarexpress: yes

It will be automatically configured to connect to the Pulsar cluster.

Accessing Pulsar Express on your local machine

To access the Pulsar Express UI on your local machine, forward port 3000:

kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l component=pulsarexpress -o jsonpath='{.items[0].metadata.name}') 3000:3000

Accessing Pulsar Express from cloud provider

To access Pulsar Express from a cloud provider, the chart supports Kubernetes Ingress. Your Kubernetes cluster must have a running Ingress controller (ex Nginx, Traefik, etc).

Set these values to configure the Ingress for Pulsar Express:

pulsarexpress:
  ingress:
    enabled: yes
    host: pulsar-ui.example.com
    annotations:
      ingress.kubernetes.io/auth-secret: ui-creds
      ingress.kubernetes.io/auth-type: basic

Pulsar Express does not have any built-in authentication capabilities. You should use authentication features of your Ingress to limit access. The example above (which has been tested with Traefik) uses annotations to enable basic authentication with the password stored in secret.

Tiered Storage

Tiered storage (offload to blob storage) can be configured in the storageOffload section of the values.yaml file. Instructions for AWS S3 and Google Cloud Storage are provided in the file.

In addition you can configure any S3 compatible storage. There is explicit support for Tardigrade, which is a provider of secure, decentralized storage. You can enable the Tardigarde S3 gateway in the extras configuration. The instructions for configuring the gateway are provided in the tardigrade section of the values.yaml file.

Pulsar SQL

If you enable Pulsar SQL, the cluster provides Presto access to the data stored in BookKeeper (and tiered storage, if enabled). Presto is exposed on the service named <release>-sql-svc.

The easiest way to access the Presto command line is to log into the bastion host and then connect to the Presto service port, like this:

bin/pulsar sql --server pulsar-sql-svc:8080

Where the value for the server option should be the service name plus port. Once you are connected, you can enter Presto commands:

presto> SELECT * FROM system.runtime.nodes;
               node_id                |         http_uri         | node_version | coordinator | state  
--------------------------------------+--------------------------+--------------+-------------+--------
 64b7c5a1-9a72-4598-b494-b140169abc55 | http://10.244.5.164:8080 | 0.206        | true        | active 
 0a92962e-8b44-4bd2-8988-81cbde6bab5b | http://10.244.5.196:8080 | 0.206        | false       | active 
(2 rows)

Query 20200608_155725_00000_gpdae, FINISHED, 2 nodes
Splits: 17 total, 17 done (100.00%)
0:04 [2 rows, 144B] [0 rows/s, 37B/s]

To access Pulsar SQL from outside the cluster, you can enable the ingress option which will expose the Presto port on hostname. We have tested with the Traefik ingress, but any Kubernetes ingress should work. You can then run SQL queries using the Presto CLI and monitoring Presto using the built-in UI (point browser to the ingress hostname). It is recommended that you match the Presto CLI version to the version running as part of Pulsar SQL (currently 0.206).

The Presto CLI supports basic authentication, so if you enabled that on the ingress (using annotations), you can have secure Presto access.

presto --server https://presto.example.com --user admin --password
Password: 
presto> show catalogs;
 Catalog 
---------
 pulsar  
 system  
(2 rows)

Query 20200610_131641_00027_tzc7t, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]

Dependencies

Authentication

The chart can enable token-based authentication for your Pulsar cluster. For information on token-based authentication in Pulsar, go here.

For this to work, a number of values need to be stored in secrets prior to enabling token-based authentication. First, you need to generate a key-pair for signing the tokens using the Pulsar tokens command:

bin/pulsar tokens create-key-pair --output-private-key my-private.key --output-public-key my-public.key

Note: The names of the files used in this section match the default values in the chart. If you used different names, then you will have to update the corresponding values.

Then you need to store those keys as secrets.

kubectl create secret generic token-private-key \
 --from-file=my-private.key \
 --namespace pulsar

kubectl create secret generic token-public-key \
 --from-file=my-public.key \
 --namespace pulsar

Using those keys, generate tokens with subjects(roles):

bin/pulsar tokens create --private-key file:///pulsar/token-private-key/my-private.key --subject <subject>

You need to generate tokens with the following subjects:

admin
superuser
proxy
websocket (only required if using the standalone WebSocket proxy)

Once you have created those tokens, add each as a secret:

kubectl create secret generic token-<subject> \
 --from-file=<subject>.jwt \
 --namespace pulsar

Once you have created the required secrets, you can enable token-based authentication with this setting in the values:

enableTokenAuth: yes

TLS

To use TLS, you must first create a certificate and store it in the secret defined by tlsSecretName. You can create the certificate like this:

kubectl create secret tls <tlsSecretName> --key <keyFile> --cert <certFile>

The resulting secret will be of type kubernetes.io/tls. The key should not be in PKCS 8 format even though that is the format used by Pulsar. The format will be converted by chart to PKCS 8.

You can also specify the certificate information directly in the values:

# secrets:
  # key: |
  # certificate: |
  # caCertificate: |

This is useful if you are using a self-signed certificate.

For automated handling of publicly signed certificates, you can use a tool such as cert-manager. The following page describes how to set up cert-manager in AWS.

Once you have created the secrets that store the cerficate info (or specified it in the values), you can enable TLS in the values:

enableTls: yes

Chart dependency

To avoid unneccesary and freqnuent Prometheus chart upgrade, please pull the same chart version so that kube-prometheus-stack won't be upgraded unnecessarily.

helm pull prometheus-community/kube-prometheus-stack --version 12.12.2
helm dependency update

jeffoverflow / pulsar-helm-chart

Helm Chart for an Apache Pulsar Cluster

Add to local Helm repository

Installing Pulsar in a Cloud Provider

Installing Pulsar for development

Accessing the Pulsar cluster in cloud

Accessing the Pulsar cluster on localhost

Managing Pulsar using Pulsar Express

Accessing Pulsar Express on your local machine

Accessing Pulsar Express from cloud provider

Tiered Storage

Pulsar SQL

Dependencies

Authentication

TLS

Chart dependency

About

Languages