Cloudflow Core Concepts

Cloudflow allows you to quickly build and deploy distributed stream processing applications by breaking them into smaller stream processing units called Streamlets. Each Streamlet represents an independent stream processing component that implements a self-contained stage of the application logic. Streamlets let you break down your application into logical pieces that communicate with each other in a streaming fashion to accomplish an end to end goal. Streamlets can be composed into larger systems using blueprints, which specify how Streamlets can be connected together to form a topology.

In this document, we give you an overview of the main building blocks of a Cloudflow application. We start with an overview of the Streamlet concept.

Streamlets

Streamlets are the core building blocks of a Cloudflow application. Each Streamlet represents an independent stream processing component that implements a self-contained stage of the application logic.

The lightweight Streamlet API exposes the raw power of the underlying runtime and its libraries while providing a higher-level abstraction for composing streamlets and expressing data schemas. Your code is written in your familiar Structured Streaming, Flink, or Akka Streams native API.

Streamlets declare inlets and outlets to define the data they consume or produce. Inlets and outlets are schema-driven, ensuring that data flows are always consistent and that connections between Streamlets are compatible. The data sent between Streamlets is safely persisted in the underlying pub-sub system, allowing for independent lifecycle management of the different components.

Streamlet Shapes

The combination of inlets and outlets give the Streamlet its shape. Some examples of commonly used streamlet shapes are the following:

Ingress

An Ingress is a streamlet with zero inlets and one or more outlets. An ingress could be a server handling requests e.g. using http.

streamlets ingress
Fig. 1 - Ingress

Processor

A Processor has one inlet and one outlet. Processors represent common data transformations like map and filter, or any combination of them.

streamlets processor
Fig. 2 - Processor

FanOut

A FanOut has one inlet and more than one outlet. The FanOut splits the input into multiple data streams depending on some criteria. A typical example can be an implementation of validation where the FanOut splits the input into valid and invalid data streams.

streamlets fanout
Fig. 3 - FanOut

Egress

An Egress represents data leaving the Cloudflow application. For instance this could be data being persisted to some database, notifications being sent to Slack, files being written to HDFS, etc.

streamlets egress
Fig. 4 - Egress

Blueprint

A Blueprint connects streamlets together. This is what transforms a bunch of streamlets into an application. A blueprint is written in a file using a declarative language and is part of the project.

blueprint
Fig. 5 - Blueprint

Application

A Cloudflow application is a blueprint that defines a collection of Streamlets that can be deployed as a unit to a Cloudflow-enabled cluster.

A deployed application is the runtime realization of the blueprint. The application gets formed according to the Streamlets included and the connections specified in the blueprint, materialized as data flows between the Streamlets.

deploy 2
Fig. 6 - Application