Installing Kafka with Strimzi

The setup provided here is meant only for development purposes. Using a managed Kafka service is a sensible choice if you are not already running Kafka in production.

This guide shows how to install Strimzi and create a Kafka cluster suitable for use with Cloudflow.

Strimzi is a Kubernetes operator that simplifies the process of running Apache Kafka.

Prerequisites

CLI Tools

Make sure you have the following prerequisites installed before continuing:

Helm, version 3 or later
Kubectl

Installing Strimzi

In this guide, we will use Helm to install Strimzi.

We are going to create Strimzi in the cloudflow namespace in this guide. Make sure that the namespace cloudflow exists before continuing. To create the cloudflow namespace, execute

kubectl create ns cloudflow

Add the Strimzi Helm repository and update the local index.

helm repo add strimzi https://strimzi.io/charts/
helm repo update

Install the latest version of Strimzi.

helm install strimzi strimzi/strimzi-kafka-operator --namespace cloudflow

After the install complete, the Strimzi Kafka operator should be running in the cloudflow namespace.

$ kubectl get pods -n cloudflow
NAME                                       READY   STATUS    RESTARTS   AGE
strimzi-cluster-operator-9968fd8c9-fhqmj   1/1     Running   0          17s

Creating a Kafka cluster using Strimzi

To create a Kafka cluster using Strimzi, we have to create a CustomResource of the kind Kafka in the namespace cloudflow.

The Kafka cluster configuration shown here is meant for development and testing purposes only. Please consider your storage requirements and modify the custom resource accordingly for your intended usage.

You can use the following command to create a cluster:

kubectl apply -f - <<EOF
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: cloudflow-strimzi
  namespace: cloudflow
spec:
  kafka:
    config:
      auto.create.topics.enable: false
      log.message.format.version: "2.3"
      log.retention.bytes: 1073741824
      log.retention.hours: 1
      log.retention.check.interval.ms: 300000
      offsets.topic.replication.factor: 3
      transaction.state.log.min.isr: 2
      transaction.state.log.replication.factor: 3
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    livenessProbe:
      initialDelaySeconds: 15
      timeoutSeconds: 5
    readinessProbe:
      initialDelaySeconds: 15
      timeoutSeconds: 5
    replicas: 3
    resources: {}
    storage:
      deleteClaim: false
      size: 100Gi
      type: persistent-claim
    version: 2.7.1
  zookeeper:
    livenessProbe:
      initialDelaySeconds: 15
      timeoutSeconds: 5
    readinessProbe:
      initialDelaySeconds: 15
      timeoutSeconds: 5
    replicas: 3
    resources: {}
    storage:
      deleteClaim: false
      size: 10Gi
      type: persistent-claim
EOF

When the Strimzi Kafka operator has processed the custom resource, there should be six new pods in the cloudflow namespace. Three Kafka broker pods and three Zookeeper pods.

$ kubectl get pods -n cloudflow
NAME                                       READY   STATUS    RESTARTS   AGE
cloudflow-strimzi-kafka-0                  2/2     Running   0          1m
cloudflow-strimzi-kafka-1                  2/2     Running   0          1m
cloudflow-strimzi-kafka-2                  2/2     Running   0          1m
cloudflow-strimzi-zookeeper-0              1/1     Running   0          2m
cloudflow-strimzi-zookeeper-1              1/1     Running   0          2m
cloudflow-strimzi-zookeeper-2              1/1     Running   0          2m
strimzi-cluster-operator-9968fd8c9-fhqmj   1/1     Running   0          5m

If you want to change any parameters of the Kafka cluster, edit the kafka-cluster.yaml file and apply it again to the cluster, as shown above. The Strimzi Kafka cluster operator will make the necessary changes when detecting an update of the custom resource.

The most important parameters here are:

Log retention policy - see this article and this one for log retention explanation. In our example we are using 1 hour retention, which is good enough for testing, with retention time check at 300000 ms. Alternatively you can use log.retention.bytes - a size-based retention policy (in bytes).
Kafka storage size - you need to make sure that this value is greater than the max log retention size in bytes, or calculated log size based on the log retention time policy and anticipated message rate.

Removing Strimzi

If you want to remove Strimzi, you can do that with the following Helm command:

helm delete strimzi -n cloudflow

You can find all Helm releases installed in the cloudflow namespace by executing the following command:

helm ls -n cloudflow