Adding Spark Support
Integration with the Spark Operator has been deprecated since 2.2.0, and will be removed in version 3.x. Spark integration has moved to the Cloudflow-contrib project. Please see the Cloudflow-contrib getting started guide for instructions on how to use Spark Native Kubernetes integration. The documentation that follows describes the deprecated feature. |
See Storage requirements (for use with Spark or Flink) to setup Storage Requirements for Spark applications. |
If you plan to write applications that utilize Spark, you will need to install the Spark operator before deploying your Cloudflow application.
The only Cloudflow compatible Spark operator is a fork maintained by Lightbend. See Backported Fix for Spark 2.4.5 for more details.
Add the Spark Helm chart repository and update the local index.
helm repo add incubator https://charts.helm.sh/incubator helm repo update
Before installing the Helm chart, we will need to prepare a value.yaml
file, it overrides the default values in the Spark Helm chart.
cat > spark-values.yaml << EOF rbac: create: true serviceAccounts: sparkoperator: create: true name: cloudflow-spark-operator spark: create: true name: cloudflow-spark enableWebhook: true enableMetrics: true controllerThreads: 10 installCrds: true metricsPort: 10254 metricsEndpoint: "/metrics" metricsPrefix: "" resyncInterval: 30 webhookPort: 8080 sparkJobNamespace: "" operatorVersion: latest --- kind: Service apiVersion: v1 metadata: name: cloudflow-webhook labels: app.kubernetes.io/name: sparkoperator spec: selector: app.kubernetes.io/name: sparkoperator EOF
Now we can install the Spark operator using Helm. Note that we use a specific version of the Spark operator Helm chart.
helm upgrade -i spark-operator incubator/sparkoperator \
--values="spark-values.yaml" \
--namespace cloudflow
OpenShift requires a role-binding
If you are installing the Spark operator on OpenShift, you need to apply the following cat > spark-rolebinding.yaml << EOF apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: system:openshift:scc:anyuid namespace: cloudflow roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:openshift:scc:anyuid subjects: - kind: ServiceAccount name: cloudflow-spark namespace: cloudflow EOF To apply the oc apply -f spark-rolebinding.yaml |
After the Spark operator installation completes, we need to adjust its webhook configuration:
$ kubectl get pods -n cloudflow NAME READY STATUS RESTARTS AGE cloudflow-operator-6b7d7cbdfc-xb6w5 1/1 Running 0 109s spark-operator-sparkoperator-56d6ffc8cd-cln67 1/1 Running 0 109s
For this, we will create yet another YAML file that we will apply to the cluster with kubectl
:
cat > spark-mutating-webhook.yaml << 'EOF2' apiVersion: batch/v1 kind: Job metadata: name: cloudflow-patch-spark-mutatingwebhookconfig spec: template: spec: serviceAccountName: cloudflow-operator restartPolicy: OnFailure containers: - name: main image: alpine:3.12 command: - /bin/ash - "-c" - | apk update && apk add wget wget -q -O /bin/kubectl https://storage.googleapis.com/kubernetes-release/release/v1.19.6/bin/linux/amd64/kubectl && chmod 755 /bin/kubectl NAME="spark-operator-sparkoperator" API_VERSION=$(kubectl get deployment -n cloudflow $NAME -o jsonpath='{.apiVersion}') UUID=$(kubectl get deployment -n cloudflow $NAME -o jsonpath='{.metadata.uid}') KIND=$(kubectl get deployment -n cloudflow $NAME -o jsonpath='{.kind}') HOOK_NAME="spark-operator-sparkoperator-webhook-config" JSON=$(cat <<EOF { "metadata": { "ownerReferences": [ { "apiVersion": "$API_VERSION", "blockOwnerDeletion": true, "controller": true, "kind": "$KIND", "name": "$NAME", "uid": "$UUID" } ] } } EOF ) echo $JSON kubectl patch MutatingWebhookConfiguration $HOOK_NAME -n cloudflow -p "$JSON" EOF2
Now we can apply the changes to the webhook:
kubectl apply -f spark-mutating-webhook.yaml --namespace cloudflow
That completes the installation of the Spark operator. You can check the result with the following command:
$ kubectl get pods -n cloudflow NAME READY STATUS RESTARTS AGE cloudflow-operator-6b7d7cbdfc-xb6w5 1/1 Running 0 3m29s cloudflow-patch-spark-mutatingwebhookconfig-66xxb 0/1 Completed 0 21s spark-operator-sparkoperator-56d6ffc8cd-cln67 1/1 Running 0 3m29s
Adding Flink support
If you plan to write applications that utilize Flink you will need to install the Flink operator before deploying your Cloudflow application.
Continue with Adding Flink Support.