Introducing Cloudflow

Cloudflow enables you to quickly develop distributed streaming applications to deploy on Kubernetes. The rapid advance and adoption of the Kubernetes ecosystem is delivering on the devops promise of giving control and responsibility to dev teams over the lifecycle of their applications. However, use of Kubernetes also increases the complexity of delivering end-to-end applications. Cloudflow alleviates that pain for distributed streaming applications, which are often far more complex than typical front-end or microservice deployments.

Cloudflow integrates with popular streaming engines like Akka, Apache Spark, and Apache Flink. Using Cloudflow, you can easily break down your streaming application into small composable components and wire them together with schema-based contracts. With its powerful abstractions, Cloudflow enables you to define, build and deploy the most complex streaming applications.

Streaming application requirements

Stream processing is a discipline and a set of techniques for extracting information from unbounded data. Streaming applications apply stream processing to provide actionable insights from data as it arrives into the system. The growing popularity of streaming applications is driven by:

  • the increasing availability of data from many sources.

  • the need of enterprises to speed up their reaction time to that data.

We characterize streaming applications as a connected graph of stream-processing components, where each component specializes on a particular task, using the 'right tool for the job' premise. The figure below, An Abstract Streaming Application, generically illustrates an application that processes data events:

  • The first circle on the left represents an initial stage for capturing or accepting data. This could be an HTTP endpoint to accept data from remote clients, a connection to a Kafka topic, or input from an internal system in an enterprise.

  • The next circle to the right represents a processing phase that applies some logic to the data, such as: business rules, statistical data analysis, or a machine learning model that implements the business aspect of the application. This processing component may add additional information to the event and send it as valid data to an external system or flag the data as invalid and report it.

  • The final two circles on the right shows the two different data output paths, valid and invalid.

    abstract streaming app
    Fig. 1 - An Abstract Streaming Application

Each component in the illustration presents different application requirements, scalability concerns, and often—​different Kubernetes deployment strategies. Such specialized needs make even a simple streaming application non-trivial to develop and deploy. For large enterprises with complex use cases, creating streaming data applications that can extract actionable business value quickly is challenging.

The Cloudflow approach

Cloudflow offers a complete solution for the creation, deployment, and management of streaming applications on Kubernetes. For development, it offers a toolbox that simplifies and accelerates application creation. For deployment, it comes with a set of extensions that makes Cloudflow streaming applications native to Kubernetes.

A common challenge when building streaming applications is wiring all of the components together and testing them end-to-end before  going into production. Often different teams are responsible for different parts of the application, making coordination difficult. Repeatedly spinning up and deleting Kubernetes pods can be both difficult and costly. Cloudflow addresses this by allowing you to validate the connections between components and to run your application locally during development to avoid surprises during deployment.

Everything in Cloudflow is done in the context of an application, which represents a self-contained distributed system of data processing services connected together by data streams. A Cloudflow application includes:

  • Streamlets, which contain the stream processing logic

  • Blueprints, which define how streamlets are composed and configured.

At deployment time, Cloudflow uses the blueprint to deploy all its streamlets so that data can start streaming as described. Cloudflow tools make it easy to create and manage multiple applications, each one forming a separate self-contained system, as illustrated below.

apps 1
Fig. 1 - Multiple Cloudflow Applications

Feature set

Cloudflow includes tools for developing and deploying the most complex streaming applications:

  • The Cloudflow application development toolkit includes:

    • An API definition for Streamlet, the core abstraction in Cloudflow.

    • An extensible set of runtime implementations for Streamlet(s). Cloudflow provides support for popular streaming runtimes, like Spark’s Structured Streaming, Apache Flink, and Akka Streams.

    • A Streamlet composition model driven by a blueprint definition.

    • A Sandbox execution mode that accelerates the development and testing of your applications.

    • A set of sbt plugins for packaging Cloudflow applications into a deployable container.

    • A CLI, in the form of a kubectl plugin, that facilitates manual and scripted management of the application.

  • For deploying to Kubernetes clusters, the Cloudflow operator takes care of orchestrating the deployment of the different parts of a streaming pipeline as an end-to-end application.

Cloudflow dramatically accelerates your application development efforts, reducing the time required to create, package, and deploy an application from weeks—​to hours. Lightbend also offers support with a Lightbend Subscription.