Cloudflow uses and extends
sbt, the Scala Build Tool, to support the building, packaging, and local execution of applications.
The project structure of a Cloudflow application follows the general conventions and folder structure of an
sbt project with a few additional considerations.
To create any Cloudflow application, we require the following folder structure and files that we describe next:
. ├── build.sbt (1) ├── project (2) │ ├── build.properties (3) │ ├── cloudflow-plugins.sbt (4) │ └── plugins.sbt (5) ├── src (6) │ └── main └── target-env.sbt (7)
You might recognize this structure as the typical
sbt project with a few additional components.
Two particular files that require your attention:
cloudflow-plugins.sbtadds the repository and plugin version for the Cloudflow
target-env.sbtcontains the configuration of the Docker registry to use for publishing the resulting Docker images.
In the following sections, we explore the Cloudflow plugin in more detail. We also review the project setup required for single and multi-runtime projects.
The Cloudflow plugin is the primary enabler of any Cloudflow application development.
Internally, it contains several
sbt plugin definitions that support the building of applications and enable additional functions such as the local sandbox, that lets you run the applications on the local machine.
We add the Cloudflow plugin to our application by creating a
cloudflow-plugins.sbt file in the
// Resolver for the cloudflow-sbt plugin // resolvers += Resolver.url("cloudflow", url("https://lightbend.bintray.com/cloudflow"))(Resolver.ivyStylePatterns) addSbtPlugin("com.lightbend.cloudflow" % "sbt-cloudflow" % "2.0.7")
|1||Adds the bintray repository hosting the Cloudflow plugin.|
The version of the
sbt-cloudflow plugin is added to the project, a number of plugins become available to your application.
These plugins are used to enable different facets of an application.
The application-level plugins provided by
CloudflowApplicationPluginis a SBT plugin for building and publishing the application. It provides a target
buildAppthat validates the application against its blueprint, creates and publishes all the Docker images required, and generates the application CR, a JSON document that describes the application and is used by the Cloudflow CLI to initiate the application deployment on a Kubernetes cluster.
There must be at-most-one
CloudflowApplicationPluginenabled in a Cloudflow application.
CloudflowLibraryPluginis used when we want to create generic libraries that use Cloudflow concepts, such as data definitions (e.g.
Codec), AVRO support, Protobuf support, etc.
sbtthat allows you to run complete Cloudflow application in your local development environment.
CloudflowLocalRunnerPlugindoes not require an explicit declaration in the build file. It is an autoplugin that becomes enabled when the
Cloudflow provides out-of-the-box support for Akka, Apache Spark Structured Streaming, and Apache Flink in the form of specific implementations of the Streamlet API and supporting elements, like Docker base images and runners that are able to launch each specific technology stack on Kubernetes.
Streamlets for a specific runtime, we add the corresponding plugin to the project.
To ensure proper isolation of the different runtimes, we require a different structural approach when using only one runtime or when mixing
Streamlets implementations that use two or more runtimes.
We explore these differences in the next section.
Before delving into that, here we list the runtime plugins:
SBT plugin for creating Akka-based streamlets.
SBT plugin for creating Spark Structured Streaming based streamlets.
SBT plugin for creating Flink-based streamlets.
Each runtime plugin uses its own default base image. It’s
Applications that use a single runtime are the simplest form of a Cloudflow application project structure and build definition.
As we see in the next section, we can also combine
Streamlets that use different runtimes in a single application, but that requires a slightly more complex build structure.
Let’s explore how to declare a Cloudflow application in a build definition with an example:
import sbt._ import sbt.Keys._ lazy val sensorData = (project in file(".")) .enablePlugins(CloudflowApplicationPlugin, CloudflowAkkaPlugin) .settings(
|See the complete example in the examples folder of the Cloudflow project|
We provide templates to get started with a Cloudflow project that uses a single runtime. See templates folder of the Cloudflow project
Applications that combine multiple streamlet runtimes must be organized using
sbt multi-projects builds.
Streamlets for each runtime must added be into their own separate sub-project.
|The multi-project build organization is necessary to ensure the correct isolation of the dependencies required by each runtime in their own classpath. Frameworks like Spark and Flink pack a long list of library dependencies and some of them conflict with each other when placed in the same classloader. Using multi-project builds ensures the correct functioning of the local sandbox and informs the generation of a Docker image per sub-project with the correct dependencies.|
Next to that, we recommend separating the data-oriented schemas into their own sub-project so that streamlet sub-projects can easily import these common data definitions.
For example, if we have an application that uses Akka and Structured Streaming, we could have the following structure:
. ├── my-application (1) │ └── src │ └── main │ └── blueprint (2) ├── datamodel (3) │ └── src │ └── main │ └── avro (4) ├── akka-ingestor (5) │ └── src │ ├── main │ │ └── scala │ └── test │ └── scala ├── spark-aggregation (6) │ └── src │ ├── main │ │ └── scala │ └── test │ └── scala ├── project (7) │ ├── build.properties │ └── cloudflow-plugins.sbt ├── build.sbt (8) └── target-env.sbt (9)
|1||The application 'root' project that contains the
We can find a
build.sbt definition that exemplifies the use of a multi-project setup in this Cloudflow example
You can learn more about the Schema-First approach used in Cloudflow