-
Notifications
You must be signed in to change notification settings - Fork 129
Stream Grids 2.What is a Stream Workflow
Stream Workflow is a structured flow of data which collects, processes and analyzes high-volume data to generate real-time insights. These workflows use Apache Spark Streaming APIs to process streaming data in micro-batches and enables scalable, high-throughput, fault-tolerant stream processing of live data streams. Support for Apache Storm will be added in future releases.
A Stream Workflow is made up of Sources, Transformations, Emitters & Persistent Stores.
• Sources: Data access in Stream Grids is realized by Sources which are built-in drag and drop operators to consume data from various data sources such as message queues, transactional databases, log files, and sensors for IOT data. Examples: Kafka, RabbitMQ, Twitter etc.
• Transformations: Transformations are the built-in operators for processing the streaming data by performing various transformations operations. Support for analytical operations to be added in future releases. Examples: Window, Sort, Join, Group, Enrich/Lookup, Deduplicate, Aggregation etc.
• Persistent Stores: Persistent Stores define the destination stage of a workflow which could be a NoSql store, relational database, or distributed File Systems. Examples: HDFS, HBase, Cassandra, Elastic Search, Solr etc.
• Emitters: Emitters are the same as persistent stores and act as destination stage of a workflow, except that they support further downstream operations on the streaming data. Examples could be like messaging queues like Kafka, RabbitMQ etc.