Project Structure
Cloudflow uses and extends sbt
, the Scala Build Tool, to support the building, packaging, and local execution of applications.
The project structure of a Cloudflow application follows the general conventions and folder structure of an sbt
project with a few additional considerations.
The Essential Structure
To create any Cloudflow application, we require the following folder structure and files that we describe next:
.
├── build.sbt (1)
├── project (2)
│ ├── build.properties (3)
│ ├── cloudflow-plugins.sbt (4)
│ └── plugins.sbt (5)
├── src (6)
│ └── main
└── target-env.sbt (7)
1 | The build.sbt file specifies the build configuration, project dependencies, and eventual sub-projects. |
2 | The project\ folder contains additional configuration for the project build, like plugins and sbt versions. This folder is mandatory. |
3 | The build.properties file contains the sbt version |
4 | The cloudflow-plugins.sbt file contains the dependency for the Cloudflow plugin for sbt |
5 | In the plugins.sbt file |
6 | The src folder contains the code for the application. |
7 | The target-env.sbt file contains the configuration for the Docker registry to use when publishing the application Docker containers. |
You might recognize this structure as the typical sbt
project with a few additional components.
Two particular files that require your attention:
-
The
cloudflow-plugins.sbt
adds the repository and plugin version for the Cloudflowsbt
plugins and, -
the
target-env.sbt
contains the configuration of the Docker registry to use for publishing the resulting Docker images.
In the following sections, we explore the Cloudflow plugin in more detail. We also review the project setup required for single and multi-runtime projects.
The Cloudflow Plugin
The Cloudflow plugin is the primary enabler of any Cloudflow application development.
Internally, it contains several sbt
plugin definitions that support the building of applications and enable additional functions such as the local sandbox, that lets you run the applications on the local machine.
We add the Cloudflow plugin to our application by creating a cloudflow-plugins.sbt
file in the project/
folder.
val latestVersion = {
sys.env.get("CLOUDFLOW_VERSION").fold(
sbtdynver.DynVer(None, "-", "v")
.getGitDescribeOutput(new java.util.Date())
.fold(throw new Exception("Failed to retrieve version"))(_.version("-"))
)(identity)
}
addSbtPlugin("com.lightbend.cloudflow" % "sbt-cloudflow" % latestVersion) (1)
1 | Adds the sbt-cloudflow plugin with a given version to the build definition. |
The version of the sbt-cloudflow plugin defines which version of Cloudflow you are using.
It must match with the version installed in your Kubernetes cluster.
|
Once the sbt-cloudflow
plugin is added to the project, a number of plugins become available to your application.
These plugins are used to enable different facets of an application.
Application-level Plugins
The application-level plugins provided by sbt-cloudflow
are:
- The
CloudflowApplicationPlugin
-
The
CloudflowApplicationPlugin
is a SBT plugin for building and publishing the application. It provides a targetbuildApp
that validates the application against its blueprint, creates and publishes all the Docker images required, and generates the application CR, a JSON document that describes the application and is used by the Cloudflow CLI to initiate the application deployment on a Kubernetes cluster.There must be at-most-one
CloudflowApplicationPlugin
enabled in a Cloudflow application. - The
CloudflowLibraryPlugin
-
The
CloudflowLibraryPlugin
is used when we want to create generic libraries that use Cloudflow concepts, such as data definitions (e.g.Codec
), AVRO support, Protobuf support, etc. - The
CloudflowLocalRunnerPlugin
-
The
CloudflowLocalRunnerPlugin
adds therunLocal
task tosbt
that allows you to run complete Cloudflow application in your local development environment.The
CloudflowLocalRunnerPlugin
does not require an explicit declaration in the build file. It is an autoplugin that becomes enabled when theCloudflowApplicationPlugin
is present.
Runtime-specific Plugins
Cloudflow provides out-of-the-box support for Akka, Apache Spark Structured Streaming, and Apache Flink in the form of specific implementations of the Streamlet API and supporting elements, like Docker base images and runners that are able to launch each specific technology stack on Kubernetes.
To develop Streamlet
s for a specific runtime, we add the corresponding plugin to the project.
To ensure proper isolation of the different runtimes, we require a different structural approach when using only one runtime or when mixing Streamlet
s implementations that use two or more runtimes.
We explore these differences in the next section.
Before delving into that, here we list the runtime plugins:
CloudflowAkkaPlugin
-
SBT plugin for creating Akka-based streamlets.
CloudflowSparkPlugin
-
SBT plugin for creating Spark Structured Streaming based streamlets.
CloudflowFlinkPlugin
-
SBT plugin for creating Flink-based streamlets.
Each runtime plugin uses its own default base image. It’s In an Akka-based streamlet the default base image value is |
lazy val sampleApp = (project in file("."))
.settings(
cloudflowAkkaBaseImage := "myRepositoryUrl/myRepositoryPath:2.0.10-cloudflow-akka-2.6.6-scala-2.12",
Single Runtime Applications
Applications that use a single runtime are the simplest form of a Cloudflow application project structure and build definition.
As we see in the next section, we can also combine Streamlet
s that use different runtimes in a single application, but that requires a slightly more complex build structure.
Let’s explore how to declare a Cloudflow application in a build definition with an example:
import sbt._
import sbt.Keys._
lazy val sensorData = (project in file("."))
.enablePlugins(CloudflowApplicationPlugin, CloudflowAkkaPlugin) (1)
.settings(
1 | use enablePlugins to activate the Cloudflow plugins for this project. |
See the complete example in the examples folder of the Cloudflow project |
We provide templates to get started with a Cloudflow project that uses a single runtime. See templates folder of the Cloudflow project
Multiple Runtime Applications
Applications that combine multiple streamlet runtimes must be organized using sbt
multi-projects builds.
Streamlets for each runtime must added be into their own separate sub-project.
The multi-project build organization is necessary to ensure the correct isolation of the dependencies required by each runtime in their own classpath. Frameworks like Spark and Flink pack a long list of library dependencies and some of them conflict with each other when placed in the same classloader. Using multi-project builds ensures the correct functioning of the local sandbox and informs the generation of a Docker image per sub-project with the correct dependencies. |
Next to that, we recommend separating the data-oriented schemas into their own sub-project so that streamlet sub-projects can easily import these common data definitions.
For example, if we have an application that uses Akka and Structured Streaming, we could have the following structure:
.
├── my-application (1)
│ └── src
│ └── main
│ └── blueprint (2)
├── datamodel (3)
│ └── src
│ └── main
│ └── avro (4)
├── akka-ingestor (5)
│ └── src
│ ├── main
│ │ └── scala
│ └── test
│ └── scala
├── spark-aggregation (6)
│ └── src
│ ├── main
│ │ └── scala
│ └── test
│ └── scala
├── project (7)
│ ├── build.properties
│ └── cloudflow-plugins.sbt
├── build.sbt (8)
└── target-env.sbt (9)
1 | The application 'root' project that contains the blueprint definition. |
2 | The blueprint.conf is expected in a src/main/blueprint directory by default. |
3 | The datamodel/ folder contains the shared schema definitions. |
4 | The avro/ folder contains the schema definitions in AVRO format. |
5 | akka-ingestor is a sub-project that contains Akka-based streamlets |
6 | spark-aggregation is a sub-project that contains Spark-based streamlets |
7 | The project/ folder contains the cloudflow-plugins.sbt and other necessary files as we saw previously. |
8 | The target-env.sbt contains the Docker repository configuration, as we saw at the start of the chapter. |
We can find a build.sbt
definition that exemplifies the use of a multi-project setup in this Cloudflow example
Where to go next?
You can learn more about the Schema-First approach used in Cloudflow