Thursday, April 14, 2016

Kappa Architecture implementation using Apache Flink, Kafka and Cassandra



Refer to the code here - https://github.com/tuhingupta/kappa-streaming
What is Kappa Architecture?
Kappa Architecture is a software architecture pattern. Rather than using a relational DB like SQL or a key-value store like Cassandra, the canonical data store in a Kappa Architecture system is an append-only immutable log. From the log, data is streamed through a computational system and fed into auxiliary stores for serving.


Source: Oreilly site
It follows a Command Query Responsibility Segregation pattern.

Client Writes

Client sends a stream of data, which could be 1 record or hundred thousands of records as a byte stream of json, csv or any other data format.

Server

Is a non-blocking reactive server written to accept bytes of data and parse/process bytes into records and put them into an immutable append log system like Kafka. 

Kafka

Kafka is an immutable append-only log based messaging system, where the server will write the records received as byte stream. This acts as the log from where different processes can read data and generate client specific views.

Processors

Various in-memory, stream processing frameworks like Flink, Storm, Samza can be used to process these logs to generate client specific views. These processors would be consistently churning data available on topics in Kafka and generating updated data.

Client Reads

Client now reads from the many materialized views that were created by processing the immutable append log.

Sample Use cases:

Log processing