Migration Guide 1.3.x to 2.0.x
The 2.0 release contains several new features that require changes to the project structure, blueprint definition, and build process.
In this guide, we describe the changes that you need to apply to a pre-2.0 Cloudflow application to make it compatible with the new model.
Be aware that the server-side components need to be updated to the Cloudflow 2.0 release to run a 2.0-compliant application. |
Project Structure
Cloudflow 2.0 introduces a strict separation of the classpath of the different supported runtimes (Akka, Spark, Flink, or any user-provided one). This separation prevents conflicts between different versions of dependencies used by the runtimes.
To achieve this isolation, an application that wants to use two or more runtimes should separate the streamlets for each runtime in their own sub-project, making use of the multi-project support in sbt.
In the following example, we show the relevant definitions in the build.sbt
file.
build.sbt
[Before v2.0]lazy val application = (project in file("."))
.enablePlugins(CloudflowSparkApplicationPlugin,
CloudflowAkkaStreamsApplicationPlugin,
CloudflowFlinkApplicationPlugin,
ScalafmtPlugin)
...
build.sbt
[with v2.0]lazy val root = (project in file("."))
.enablePlugins(ScalafmtPlugin)
.settings(commonSettings)
.aggregate(
app,
datamodel,
flink,
akka,
spark
)
lazy val app = (project in file("./app"))
.settings(
name:= "my-application"
)
.enablePlugins(CloudflowApplicationPlugin)
.settings(commonSettings)
lazy val datamodel = (project in file("datamodel"))
.enablePlugins(CloudflowLibraryPlugin)
.settings(commonSettings)
lazy val akka = (project in file("./akka"))
.enablePlugins(CloudflowAkkaPlugin)
.settings(commonSettings)
.dependsOn(datamodel)
lazy val spark = (project in file("./spark"))
.enablePlugins(CloudflowSparkPlugin)
.settings(commonSettings)
.dependsOn(datamodel)
lazy val flink = (project in file("./flink"))
.enablePlugins(CloudflowFlinkPlugin)
.settings(commonSettings)
.dependsOn(datamodel)
The new structure might seem heavier, but it offers a headache-free build in terms of version conflicts and other classpath-driven problems.
We use .aggregate on the root project to transitively build all the projects without mixing their dependencies.
|
At build time, this project structure will result in a Docker image for every sub-project that contains streamlets. The Akka, Spark, and Flink streamlets will be packaged in their own Docker image for deployment in a cluster.
Single Runtime Applications
It is possible to use a simplified project structure for applications that use a single runtime.
Before Cloudflow 2.0, we offered -Application
plugins for this purpose.
Those plugins are deprecated in 2.0.
Instead, to create a Cloudflow application that uses a single runtime, activate the CloudflowApplicationPlugin
together with the runtime of choice:
-
Akka:
CloudflowAkkaPlugin
-
Spark:
CloudflowSparkPlugin
-
Flink:
CloudflowFlinkPlugin
Let’s explore the sbt declaration with a before/after example:
build.sbt
[Before v2.0]lazy val myApplication = (project in file("."))
.enablePlugins(CloudflowSparkApplicationPlugin)
build.sbt
[with v2.0]lazy val myApplication = (project in file("."))
.enablePlugins(CloudflowApplicationPlugin, CloudflowSparkPlugin)
As you would expect, a single-runtime application generates a single Docker image.
For more details on the project definition, visit the Project Structure docs.
Blueprint Definition
Cloudflow 2.0 introduces topics as first-class components in the blueprint definition. In earlier versions, Cloudflow fully managed the topics used to connect different streamlets. While this was very convenient, it also limited the connectivity possibilities.
In the 2.0 version, the Cloudflow developer is in full control of the topics used. They can still be managed by Cloudflow, but they can also be existing topics in arbitrary Kafka brokers, including cloud-hosted.
In practical terms, the connections
section of the blueprint
becomes now topics
.
The topics used by the application are defines as configuration entries in the topics
section.
Let’s see a before/after example.
build.sbt
[Before v2.0]blueprint {
streamlets {
ingress = sensors.SparkRandomGenDataIngress
process = sensors.MovingAverageSparklet
egress = sensors.SparkConsoleEgress
}
connections {
ingress.out = [process.in]
process.out = [egress.in]
}
}
build.sbt
[with v2.0]blueprint {
streamlets {
ingress = sensors.SparkRandomGenDataIngress
process = sensors.MovingAverageSparklet
egress = sensors.SparkConsoleEgress
}
topics {
data {
producers = [ingress.out]
consumers = [process.in]
}
moving-averages {
producers = [process.out]
consumers = [egress.in]
}
}
}
The migration strategy consists of:
-
Identify the topics in the application.
-
Name the topics as configuration entries.
-
streamlet outlets become producers for the topic.
-
streamlet inlets become consumers to the topic.
Taking the previous example, let’s convert the following connection definition into a topic definition:
connections {
ingress.out = [process.in]
...
}
This connection needs a topic.
We call it data
as this topic makes the data from the ingress
available to the process
.
The ingress.out
outlet becomes a producer for the topic.
the process.in
inlet becomes a consumer to the topic.
When we put this together, we have:
topics {
data {
producers = [ingress.out]
consumers = [process.in]
}
...
}
You should migrate all streamlet connections to this new topic definition applying the same logic.
Restrictions
Before Cloudflow 2.0 it was not possible to have a streamlet inlet connected to multiple outlets.
This was not allowed:
connections {
ingress.out = [process.in]
another-ingress.out = [process.in]
...
}
The same restriction applies to the new topic definition. We cannot have the same streamlet inlet as a consumer of two different topics.
This is not allowed:
topics {
data1 {
producers = [ingress.out]
consumers = [process.in] (1)
}
data2 {
producers = [another-ingress.out]
consumers = [process.in] (1)
}
...
}
1 | Having the same streamlet inlet consume from two different topics is not allowed. |
The verifyBlueprint
stage built into the Cloudflow build system will report an error if such usage is found.
Learning More
To learn more about the new blueprint structure and the new options to configure the topics, refer to Composing applications using blueprints.
Build Process
As we mentioned in Project Structure, with Cloudflow 2.0 we implemented a new way of managing projects that use multiple runtimes and potentially results in the generation of several Docker images.
In previous versions of Cloudflow, the deployment unit was a Docker image. It contained all the artifacts and an application descriptor that informed the Cloudflow operator how the application must be deployed.
With Cloudflow 2.0, and its multiple Docker image support, the new deployable unit is a JSON descriptor generated by the build process.
This application descriptor is used by the Cloudflow kubectl
plugin to generate a custom resource definition (CR) that guides the application deployment on a Cloudflow-enabled Kubernetes cluster.
In practical terms, we have deprecated the use of:
sbt buildAndPublish
Instead, you should use:
sbt buildApp
The output of the buildApp
task will report the location of the JSON file that contains the application definition.
It also shows the command that you need to execute to deploy the application.
buildApp
execution$ sbt buildApp
...
[success] Cloudflow application CR generated in /cloudflow/examples/call-record-aggregator/target/call-record-aggregator.json
[success] Use the following command to deploy the Cloudflow application:
[success] kubectl cloudflow deploy /cloudflow/examples/call-record-aggregator/target/call-record-aggregator.json
[success] Total time: 129 s (02:09), completed Jun 24, 2020 12:58:46 PM
This new model might have an impact in the CI/CD strategy for your project. The new JSON-file-based deployment is better aligned to the gitops methodology.
Sandbox
To support the new multi-project model of Cloudflow 2.0, we have updated the sandbox to support the local execution of these multi-project setups into separate JVMs.
In this way, we provide the same classpath isolation guarantee when you run your applications using runLocal
.
The new runLocal
implementation reports the output of streamlets grouped by the sub-projects where they belong.
As an additional improvement, the execution output also shows an ASCII-art representation of the application graph that connects streamlets and topics.
See Local Testing to learn more about the sandbox provided by Cloudflow.
Getting Help
The Cloudflow community is there to help you out. Find links to our different communication channels on the Cloudflow Homepage