Project Structure

Cloudflow uses and extends sbt, the Scala Build Tool, to support the building, packaging, and local execution of applications. The project structure of a Cloudflow application follows the general conventions and folder structure of an sbt project with a few additional considerations.

The Essential Structure

To create any Cloudflow application, we require the following folder structure and files that we describe next:

The structure of a Cloudflow project
.
├── build.sbt                 (1)
├── project                   (2)
│   ├── build.properties      (3)
│   ├── cloudflow-plugins.sbt (4)
│   └── plugins.sbt           (5)
├── src                       (6)
│   └── main
└── target-env.sbt            (7)
1 The build.sbt file specifies the build configuration, project dependencies, and eventual sub-projects.
2 The project\ folder contains additional configuration for the project build, like plugins and sbt versions. This folder is mandatory.
3 The build.properties file contains the sbt version
4 The cloudflow-plugins.sbt file contains the dependency for the Cloudflow plugin for sbt
5 In the plugins.sbt file
6 The src folder contains the code for the application.
7 The target-env.sbt file contains the configuration for the Docker registry to use when publishing the application Docker containers.

You might recognize this structure as the typical sbt project with a few additional components. Two particular files that require your attention:

  • The cloudflow-plugins.sbt adds the repository and plugin version for the Cloudflow sbt plugins and,

  • the target-env.sbt contains the configuration of the Docker registry to use for publishing the resulting Docker images.

In the following sections, we explore the Cloudflow plugin in more detail. We also review the project setup required for single and multi-runtime projects.

The Cloudflow Plugin

The Cloudflow plugin is the primary enabler of any Cloudflow application development. Internally, it contains several sbt plugin definitions that support the building of applications and enable additional functions such as the local sandbox, that lets you run the applications on the local machine.

We add the Cloudflow plugin to our application by creating a cloudflow-plugins.sbt file in the project/ folder.

val latestVersion = {
  sys.env.get("CLOUDFLOW_VERSION").fold(
    sbtdynver.DynVer(None, "-", "v")
      .getGitDescribeOutput(new java.util.Date())
      .fold(throw new Exception("Failed to retrieve version"))(_.version("-"))
  )(identity)
}

addSbtPlugin("com.lightbend.cloudflow" % "sbt-cloudflow" % latestVersion) (1)
1 Adds the sbt-cloudflow plugin with a given version to the build definition.
The version of the sbt-cloudflow plugin defines which version of Cloudflow you are using. It must match with the version installed in your Kubernetes cluster.

Once the sbt-cloudflow plugin is added to the project, a number of plugins become available to your application. These plugins are used to enable different facets of an application.

Application-level Plugins

The application-level plugins provided by sbt-cloudflow are:

The CloudflowApplicationPlugin

The CloudflowApplicationPlugin is a SBT plugin for building and publishing the application. It provides a target buildApp that validates the application against its blueprint, creates and publishes all the Docker images required, and generates the application CR, a JSON document that describes the application and is used by the Cloudflow CLI to initiate the application deployment on a Kubernetes cluster.

There must be at-most-one CloudflowApplicationPlugin enabled in a Cloudflow application.

The CloudflowLibraryPlugin

The CloudflowLibraryPlugin is used when we want to create generic libraries that use Cloudflow concepts, such as data definitions (e.g. Codec), AVRO support, Protobuf support, etc.

The CloudflowLocalRunnerPlugin

The CloudflowLocalRunnerPlugin adds the runLocal task to sbt that allows you to run complete Cloudflow application in your local development environment.

The CloudflowLocalRunnerPlugin does not require an explicit declaration in the build file. It is an autoplugin that becomes enabled when the CloudflowApplicationPlugin is present.

Runtime-specific Plugins

Cloudflow provides out-of-the-box support for Akka, Apache Spark Structured Streaming, and Apache Flink in the form of specific implementations of the Streamlet API and supporting elements, like Docker base images and runners that are able to launch each specific technology stack on Kubernetes.

To develop Streamlets for a specific runtime, we add the corresponding plugin to the project. To ensure proper isolation of the different runtimes, we require a different structural approach when using only one runtime or when mixing Streamlets implementations that use two or more runtimes. We explore these differences in the next section. Before delving into that, here we list the runtime plugins:

CloudflowAkkaPlugin

SBT plugin for creating Akka-based streamlets.

CloudflowSparkPlugin

SBT plugin for creating Spark Structured Streaming based streamlets.

CloudflowFlinkPlugin

SBT plugin for creating Flink-based streamlets.

Each runtime plugin uses its own default base image. It’s lightbend/akka-base for Akka, lightbend/spark for Spark and lightbend/flink for Flink. It could be the case that you’re working against your own container registry and therefore prefer to point to your copy or customized version of our base images. To use your base images, set cloudflowAkkaBaseImage, cloudflowSparkBaseImage, and cloudflowFlinkBaseImage in settings of the respective runtime plugins in your application build.sbt.

In an Akka-based streamlet the default base image value is lightbend/akka-base:${CloudflowBasePlugin.CloudflowVersion}-cloudflow-akka-$AkkaVersion-scala-${CloudflowBasePlugin.ScalaVersion}. This may be set to lightbend/akka-base:2.0.10-cloudflow-akka-2.6.6-scala-2.12 depending on your Cloudflow, Akka and Scala versions. To override this base image in your cloudflow application go to its build.sbt and set up a value in settings such as:

lazy val sampleApp = (project in file("."))
    .settings(
      cloudflowAkkaBaseImage := "myRepositoryUrl/myRepositoryPath:2.0.10-cloudflow-akka-2.6.6-scala-2.12",

Single Runtime Applications

Applications that use a single runtime are the simplest form of a Cloudflow application project structure and build definition. As we see in the next section, we can also combine Streamlets that use different runtimes in a single application, but that requires a slightly more complex build structure.

Let’s explore how to declare a Cloudflow application in a build definition with an example:

Single Runtime Setup
import sbt._
import sbt.Keys._

lazy val sensorData =  (project in file("."))
    .enablePlugins(CloudflowApplicationPlugin, CloudflowAkkaPlugin) (1)
    .settings(
1 use enablePlugins to activate the Cloudflow plugins for this project.
See the complete example in the examples folder of the Cloudflow project

We provide templates to get started with a Cloudflow project that uses a single runtime. See templates folder of the Cloudflow project

Multiple Runtime Applications

Applications that combine multiple streamlet runtimes must be organized using sbt multi-projects builds. Streamlets for each runtime must added be into their own separate sub-project.

The multi-project build organization is necessary to ensure the correct isolation of the dependencies required by each runtime in their own classpath. Frameworks like Spark and Flink pack a long list of library dependencies and some of them conflict with each other when placed in the same classloader. Using multi-project builds ensures the correct functioning of the local sandbox and informs the generation of a Docker image per sub-project with the correct dependencies.

Next to that, we recommend separating the data-oriented schemas into their own sub-project so that streamlet sub-projects can easily import these common data definitions.

For example, if we have an application that uses Akka and Structured Streaming, we could have the following structure:

Project Structure using Multiple Runtimes
.
├── my-application        (1)
│   └── src
│       └── main
│           └── blueprint (2)
├── datamodel             (3)
│   └── src
│       └── main
│           └── avro      (4)
├── akka-ingestor         (5)
│   └── src
│       ├── main
│       │   └── scala
│       └── test
│           └── scala
├── spark-aggregation     (6)
│   └── src
│       ├── main
│       │   └── scala
│       └── test
│           └── scala
├── project               (7)
│   ├── build.properties
│   └── cloudflow-plugins.sbt
├── build.sbt             (8)
└── target-env.sbt        (9)
1 The application 'root' project that contains the blueprint definition.
2 The blueprint.conf is expected in a src/main/blueprint directory by default.
3 The datamodel/ folder contains the shared schema definitions.
4 The avro/ folder contains the schema definitions in AVRO format.
5 akka-ingestor is a sub-project that contains Akka-based streamlets
6 spark-aggregation is a sub-project that contains Spark-based streamlets
7 The project/ folder contains the cloudflow-plugins.sbt and other necessary files as we saw previously.
8 The target-env.sbt contains the Docker repository configuration, as we saw at the start of the chapter.

We can find a build.sbt definition that exemplifies the use of a multi-project setup in this Cloudflow example

Where to go next?

You can learn more about the Schema-First approach used in Cloudflow