-
Notifications
You must be signed in to change notification settings - Fork 0
/
introduction.tex
51 lines (47 loc) · 3.03 KB
/
introduction.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
\section{Introduction}\label{sec:introduction}
We have entered the big-data era: the world is awash with data, and
more data is being produced every second of every day. Data analytics
solutions must contend with data being \emph{big} both in the static-data sense of
an ocean of many bytes and in the streaming sense of a firehose of
many bytes-per-second. In fact, driven by the realization that static
data is merely a snapshot of parts of a data stream, the data technology
industry is focusing increasingly on data-in-motion. Analyzing
the stream instead of the ocean yields more timely insights and saves
storage resources~\cite{andrade2014fundamentals}.
Stream processing languages facilitate the development of stream
processing applications. Streaming languages simplify common coding tasks
and make code more readable and maintainable, and their
compilers catch programming mistakes and apply optimizing code
transformations. The landscape of streaming languages is diverse and
lacks broadly accepted standards. Stephens~\cite{stephens_1997} and
Johnston et al.~\cite{johnston_hanna_millar_2004} published surveys on
stream processing languages in 1997 and 2004. Much has happened since
then, from database-inspired streaming languages to the rise of big
data and beyond. Our survey continues where prior surveys left off,
focusing on streaming languages in the big-data era.
A \emph{stream} is a sequence of data items, and the length of a
stream is conceptually infinite, in the sense that waiting for it to
end is ill-defined~\cite{muthukrishnan2005data}. A streaming application is a computer program that
consumes and produces streams. A stream processing language is a
domain-specific language designed for expressing streaming
applications. The goal of a stream processing language is to strike a
balance between the three requirements of \emph{performance}, \emph{generality}, and
\emph{productivity}. Performance is about answering high-throughput input
streams with low-latency output streams. Generality is about making it
possible to handle a variety of processing needs and data formats. And
productivity is about enabling developers to write good code
quickly.
Traditionally, programming languages have been characterized by their
paradigm, including imperative, functional, declarative,
object-oriented, etc. However, for streaming languages, the paradigm
is not the most important characteristic; most streaming languages are
more-or-less declarative. More important characteristics include the
data model (e.g., relational, XML, RDF), execution model (e.g.,
synchronous, big-data), and target domain and users (e.g., event
detection, reasoning, end-users). Section~\ref{sec:languages} surveys languages
based on these characteristics. Section~\ref{sec:principles} generalizes from individual languages to
extract recurring concepts and principles. Section~\ref{sec:whatsnext}
does the inverse: instead of looking at what most
streaming languages have in common, it explores what most streaming
languages lack. Finally,
Section~\ref{sec:conclusion} concludes our paper.