Paper Review Series : Apache Flink
Apache Flink is an open source system for stream and batch processing. Traditionally, stream data and batch data processing are deemed very different applications and are approached by different models, APIs and systems. Apache Flink, however, takes that batch processing is a special case of stream processing and stream processing model can be the unifying framework for both problems. This paper illustrated a unified architecture of stream and batch data processing Apache Flink is built upon. It showed how streaming, batch, iterative, and interactive analytics can be represented as fault-tolerant streaming dataflows. It then discussed how to build a stream analytics system with a flexible windowing mechanism and a batch processor on top of these dataflows. 1, The iterative computations on input data stream along the DAG is based on buffer exchange. The computations are in-memory and thus fast. The use of buffer stream makes back pressure propagated to producer. It’s, to my understanding, very similar to producer consumer paradigm where back producer cannot produce if consumer has not finish consuming. 2, Flink insert into stream data “barriers” regularly. These barriers move with input data stream in the DAG but are not processed. They mark the checkpoints for operators snapshotting their states. The snapshotting can be asynchronous and incremental thus does not stop processing. This snapshotting mechanism is independent of processing logic and thus is decoupled from control messages. It’s also unrelated to external storage usage. 3, Unlike Spark, Flink thinks of stream as the unifying paradigm and batch processing is a special case of stream processing over a bounded data set. Batch processing can be fulfilled by Flink streaming model by inserting all input data into a window. On top of stream model, batch processing is also optimised, such as, simpler syntax, blocking operators, optimised queries, dedicated API. Since data is static, snapshotting for fault-tolerance can be turned off as well.
