Configuring Streamlets

Cloudflow offers a flexible configuration model that lets you configure every aspect of application deployment.

In this section, we provide a brief description of the configuration model, its scopes and available settings. Explain the configuration precedence, show the interchangebility of paths and keys and finally cover how to use the configuration.

Configuration Scopes

The Cloudflow configuration model is based on hierarchical scopes. These scopes range from broad settings for a runtime to specific settings for a streamlet instance.

We have two main top-level scopes, one for streamlets and one for runtimes, as we can see here:

scope

key

streamlet

cloudflow.streamlets.[streamlet-name]

runtime

cloudflow.runtimes.[runtime]

You can learn more about it in Configuration Scopes

Configurations Available

Using this model, you can configure the following settings:

Runtime settings (for Akka, Spark, Flink, or any user-provided one)
Kubernetes pod and container configuration
Streamlet Configuration Parameters for a particular instance

Cloudflow uses HOCON as the syntax to specify the configuration. Using HOCON, the scopes are defined using a hierarchical structure for the configuration keys.

Streamlet Config Params

You can learn more about it in Streamlet Configuration Parameters

Runtime Specific Settings

For configuration settings specific to the runtime you are using, please refer to the configuration options of the specific runtime. All settings provided under the config section are passed verbatim to the underlying runtime.

For example, consider the following configuration snippet:

cloudflow {
  streamlets {
    my-streamlet {
      config {
        spark.broadcast.compress = "true"
      }
    //...
    }
  }
}

With this configuration, the setting spark.broadcast.compress = "true" is passed to the Spark runtime session of the specific my-streamlet instance.

It follows, that the configuration options that you can use in this section are dependent on the runtime used. Consult the documentation of your runtime of choice for more information.

Kubernetes Container Settings

You can learn more about it in Kubernetes Container Settings

Configuration Precedence

As a general rule, a specific scope always has precedence over a more general one.

A setting defined at the runtime scope will apply to all streamlets that use that runtime. But if a streamlet-specific configuration redefines the same setting, the more specific configuration will apply for that particular instance.

The combined example shown below specifies that by default the setting akka.loglevel should be set to INFO. Specifically for my-streamlet the log-level overrides this default and is set to DEBUG.

cloudflow.streamlets.my-streamlet.config {
  akka {
    log-level = "DEBUG"
  }
}
cloudflow.runtimes.akka.config {
  akka {
    log-level = "INFO"
  }
}

Configuration Paths as Keys

Paths can be used as keys in HOCON, which is shown in the example below:

cloudflow.streamlets.my-streamlet {
  config-parameters {
    // config parameter values go here
  }
  config {
    // runtime settings go here
  }
  kubernetes.pods.pod.containers.container {
    // kubernetes container settings go here
  }
}

An example of only setting an Akka configuration value is shown below:

cloudflow.streamlets.my-streamlet.config {
  akka {
    log-level = "DEBUG"
  }
}

Which can be collapsed further as is shown in the example below:

cloudflow.streamlets.my-streamlet.config.akka.log-level = "DEBUG"

Applying a Configuration

A streamlet can be configured at deployment time with kubectl cloudflow deploy or re-configured at runtime with kubectl cloudflow configure. These commands deploy or restart streamlets as necessary.

Configuration values can be set for all streamlets of a particular runtime at once, or they can be set for a specific streamlet.

The configuration can be specified via file arguments or passed directly on the command line.

Configuring a Streamlet using Configuration Files

Let’s look at an example of passing a configuration file to the deploy command:

$ kubectl cloudflow deploy target/my-app.json --conf my-config.conf

In the above example, the my-app application is deployed with a my-config.conf configuration file.

Configuration files are merged by concatenating the files passed with --conf flags. The last --conf [file] argument can override values specified in earlier --conf [file] arguments. In the example below, where the same configuration path is used in file1.conf and file2.conf, the configuration value in file2.conf takes precedence, overriding the value provided by file1.conf:

$ kubectl cloudflow deploy swiss-knife.json --conf file1.conf --conf file2.conf

Configuring using Command Line Arguments

It is also possible to pass configuration values directly as command-line arguments, as [config-path]=value pairs separated by a space. The [config-path] must be an absolute path to the value, exactly how it would be defined in a config file, using configuration paths.

Let’s see some examples:

Set the log-level for the akka runtime for streamlet akka-process to DEBUG

$ {cli-plugin} cloudflow deploy target/swiss-knife.json \
  cloudflow.streamlets.akka-process.config.akka.log-level = "DEBUG"

Set the memoryOverhead of the Spark driver runtime configuration to 512 (Mb)

$ {cli-plugin} cloudflow deploy target/swiss-knife.json \
  cloudflow.runtimes.spark.config.spark.driver.memoryOverhead=512

Set the streamlet configuration parameter configurable-message for streamlet spark-process to SPARK-OUTPUT:

$ {cli-plugin} cloudflow deploy target/swiss-knife.json \
  cloudflow.runtimes.spark.config.spark.driver.memoryOverhead=512 \
  cloudflow.streamlets.spark-process.config-parameters.configurable-message='SPARK-OUTPUT:'

The arguments passed with [config-key]=[value] pairs take precedence over the files passed through with the --conf flags.

Configuring a Streamlet logging configuration

The streamlet logging configuration can be tweaked for all of the pods with a simple command.

You need first to craft a proper logback.xml logging configuration file e.g.:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
  <statusListener class="ch.qos.logback.core.status.NopStatusListener" />

  <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
    <encoder>
      <pattern>%msg\n</pattern>
    </encoder>
  </appender>

  <root level="DEBUG">
    <appender-ref ref="STDOUT"/>
  </root>
</configuration>

And you can pass it to the relevant option in the deploy and configure sub-commands:

$ {cli-plugin} cloudflow deploy cr.json --logback-config logback.xml

$ {cli-plugin} cloudflow configure app-name --logback-config logback.xml

What’s Next

Now that we have mastered the configuration options in Cloudflow, we should learn about Composing applications using blueprints and how they help us to assemble streamlets into end-to-end applications.