Setting up a GKE cluster

ON THIS PAGE

This page walks you through the important steps to get Cloudflow up and running on a Kubernetes Cluster. For more detailed instructions on how to use or build your own Cloudflow installer, refer to the installer section of the GitHub repo.

To create a new cluster, clone the repo in your local system by selecting the latest release branch or download the latest release file from here, go to the installer/ folder and follow the instructions below.

Depending on your Kubernetes cloud provider, run one of the following three commands to configure kubectl to connect to the cluster:
  • GKE: gcloud init. This command initializes the gcloud system on your local and performs necessary settings of gcloud credentials. You can have a look at the details in the GKE documentation.

  • EKS: aws eks --region region update-kubeconfig --name cluster_name. For more details, refer to the EKS documentation.

  • AKS: az aks get-credentials --resource-group myResourceGroup --name myAKSCluster. For more detials, refer to the AKS documentation.

This can be done once you have a valid account and have access to a project set up for you by the administrator.
  • Create a Kubernetes cluster with a specific name on a cloud provider of your choosing. We provide cluster launching scripts for GKE, AKS and EKS. ./create-cluster-[gke|aks|eks].sh <cluster-name>

  • Make sure your cluster has at least appropriate storage class that offers ReadWriteMany (RWM) capability. Most cloud providers have their preferred storage class for that purpose (e.g. AzureFile on AKS and EFS CSI on EKS). We also provide a script ./install-nfs.sh that you can run to install NFS that will satisfy the RWM characteristic for testing purposes.

  • Go to the Github release page for your chosen Cloudflow version and download the bootstrap script. It’s in the list of assets. For example, this is the page for Cloudflow 2.0.8.

  • Run the bootstrap script to initiate the installation Cloudflow.

$./bootstrap-install-script-<CLOUDFLOW_VERSION>.sh

This will take you through the process of installing Cloudflow in the cluster. In the course of its execution, it will ask a few questions regarding the storage class to be installed. The following provides guidance on how to respond to these questions:

Select a storage class for workloads requiring persistent volumes with access mode 'ReadWriteMany'.
Examples of these are Spark and Flink checkpointing and savepointing.

   NAME                 PROVISIONER                             SUPPORTS RWM?           DEFAULT?
1. standard             kubernetes.io/gce-pd                    Unknown                 -
2. nfs                  cluster.local/nfs-server-provisioner    Unknown                 -
> 2

Select a storage class for workloads requiring persistent volumes with access mode 'ReadWriteOnce'.
Examples of these are Kafka, Zookeeper, and Prometheus.

   NAME                 PROVISIONER                             SUPPORTS RWO?            DEFAULT?
1. standard             kubernetes.io/gce-pd                    Verified                 -
2. nfs                  cluster.local/nfs-server-provisioner    Unknown                  -
> 1

It then takes you through the installation of the Cloudflow operator, the Strimzi Kafka operator, the Spark operator and the Flink operator. If things go ok, you will see something like the following on your console:

+------------------------------------------------------------------------------------+
|                      Installation of Cloudflow has completed                       |
+------------------------------------------------------------------------------------+

NAME                                                         READY   STATUS              RESTARTS   AGE
cloudflow-flink-operator-8588dbd8f4-gsjnm                    1/1     Running             0          70s
cloudflow-operator-57f47676f7-svbvj                          0/1     ContainerCreating   0          3s
cloudflow-sparkoperator-69669fdd54-88brw                     0/1     ContainerCreating   0          35s
strimzi-cluster-operator-7ff64d4b7-c9r2n                     1/1     Running             0          51s

The complete installation takes a few minutes. All operators are installed in the namespace cloudflow. You can follow the progress of the installation using the following kubectl command:

$ kubectl get pods -n cloudflow

When everything is installed, you will see the following output:

$ kubectl get pods -n cloudflow
NAME                                                         READY   STATUS    RESTARTS   AGE
cloudflow-flink-operator-8588dbd8f4-gsjnm                    1/1     Running   0          3m26s
cloudflow-operator-57f47676f7-svbvj                          1/1     Running   0          2m19s
cloudflow-sparkoperator-69669fdd54-88brw                     1/1     Running   0          2m51s
cloudflow-strimzi-entity-operator-5bc9695975-584fb           1/2     Running   0          27s
cloudflow-strimzi-kafka-0                                    2/2     Running   0          79s
cloudflow-strimzi-kafka-1                                    2/2     Running   0          79s
cloudflow-strimzi-kafka-2                                    2/2     Running   0          79s
cloudflow-strimzi-zookeeper-0                                2/2     Running   0          2m16s
cloudflow-strimzi-zookeeper-1                                2/2     Running   0          2m15s
cloudflow-strimzi-zookeeper-2                                2/2     Running   0          2m15s
strimzi-cluster-operator-7ff64d4b7-c9r2n                     1/1     Running   0          3m7s

This completes the process of installing Cloudflow on a Kubernetes cluster.

What’s next