From Fast to Smart Data

Lambda Architecture with Apache Spark, Kafka and Cassandra

Today everyone is talking about BigData, FastData, SmartData and Realtime Decisioning. But what is it all about? Why and especially how to do it?

My definition of Fast Data basically is: BigData but coming fast. Smart Data means: BigData is nothing worth, if you not gather relvant information out of it (immediately).

This example should provide one way how to implement a realtime lambda usecase (more about lambda architecture).

The basic idea is: query = function(all data) where all data is precomputed + increment.

The choosen technologies are:

Beside those there is Springboot (1.5.7) used as service implementation.

Why this techologies? Let's say, they are currently on-the-rise. Of course there are a lot of alternatives, just google.

Usecase

Let's assume we are a loyalty programme provider and we want to provide realtime marketing. Let's clarifiy realtime in this case: try to be as fast as possible to achieve highest possible relevancy of the decision for the member.

Campaign rule: A customer is collecting loyalty points and if his current monthly balance is higher than 199 he will receive a treatment.

Architecture

Lambda contains of three parts: Batch-, Speed- and Serving-Layer.

Batch Layer is triggered regularly, e.g. every 24h.

Lambda

Following should somehow give you an idea what is meant by lambda

Point 0: the initial start of your application
Point 1: your first batch has started and created precomputed values. There is now a cut-off time X.
Point 2: everything introduced into your system after X will be an incremental
Point 3: still there was no further batch run, so still incrementing
Point 4: there is another batch run, so the old increment is now part of the precomputed and you have a new value X (any increment stored before X will not be considered anymore).
Point 5: basically same as Point 3, but now further in time

At any given time your total information is: precomputed + increment.

Implementation

REST resource Purchase 2. send to kafka topic purchases 3. process the purchase event (save and update member account)
Apache Spark Job to aggregate the purchases into monthly balance (Batch Layer - Precomputed View)
Apache Spark Streaming Job for kafka topic purchases 6. does the delta calculation for monthly balance (Speed Layer - Incremental View) 7. evaluates the rule conditions (Serving Layer)

(and of course there is some more around to have a fully working usecase)

Let's go!

References

http://jameskinley.tumblr.com/post/37398560534/the-lambda-architecture-principles-for

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
data		data
doc		doc
external/src		external/src
flink/src/main/scala/net/fast2smart/streaming		flink/src/main/scala/net/fast2smart/streaming
gradle/wrapper		gradle/wrapper
legacy/src/main/java/net/fast2smart/legacy		legacy/src/main/java/net/fast2smart/legacy
spark/src/main		spark/src/main
systemtest/src		systemtest/src
.gitignore		.gitignore
.travis.yml		.travis.yml
README.md		README.md
build.gradle		build.gradle
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

From Fast to Smart Data

Lambda Architecture with Apache Spark, Kafka and Cassandra

Usecase

Architecture

Lambda

Implementation

Let's go!

References

About

Releases

Packages

Languages

markush81/fast2smart

Folders and files

Latest commit

History

Repository files navigation

From Fast to Smart Data

Lambda Architecture with Apache Spark, Kafka and Cassandra

Usecase

Architecture

Lambda

Implementation

Let's go!

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages