Develop the Wind Turbine example
In this section we will develop a simple Hello World-type application that demonstrates the major features of Cloudflow. It will not have many of the advanced features that Cloudflow offers—the main idea is to give you a feel for what it takes to build a complete application and deploy it in a GKE cluster.
Application overview
The Hello World application is a simple pipeline that processes events from a wind turbine farm. Each turbine has a unique ID and emits a time-stamped stream of events that contain measurements such as wind speed, rotor speed, and power produced.
To make sure that the data flow is meaningful and correct, the streamlets in our application will: accept events, convert them into domain objects, validate them, and log them. The following illustrates the flow:
As shown in the image, streaming data will pass through the following transformations:
-
Ingestion by an Ingress streamlet: An Ingress streamlet has zero inlets and one or more outlets. An ingress could be a server handling requests e.g. using http. In this example, it ingests the wind turbine events.
-
Conversion by a Processor streamlet into a domain object: A Processor streamlet has one inlet and one outlet. Processors represent common data transformations like map and filter, or any combination of them. In this example, the Processor converts the event data into metrics that can be further validated.
-
Validation by a FanOut streamlet: A FanOut (also known as a Splitter) has one inlet and more than one outlet. The FanOut splits the input into multiple data streams depending on some criteria. The example Splitter separates valid and invalid records.
-
Logging to Egress streamlet: An Egress represents data leaving the application. For instance this could be data being persisted to some database, notifications being sent to Slack, files being written to HDFS, etc. In this case, the streamlet will log data to be checked at output.
Development overview
In Cloudflow, streamlet inputs and outputs are statically typed. The types represent objects that the specific input/output can handle. The first step in application development is to encode the objects in the form of an avro schema. Cloudflow will generate appropriate classes corresponding to each schema.
One of the important features of Cloudflow architecture is the complete separation of the components from how they are connected as part of the pipeline. The streamlets described above are the individual building blocks of the pipeline. You can connect them using a declarative language that forms the blueprint of the pipeline. Streamlets can be shared across blueprints making them individual reusable objects. And just like streamlets, blueprints also form an independent component of a Cloudflow application.
This example demonstrates using Akka streamlets implemented in Scala. We’ll follow these steps: