-
Notifications
You must be signed in to change notification settings - Fork 0
/
principles.tex
140 lines (133 loc) · 8.23 KB
/
principles.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
\section{Principles}\label{sec:principles}
The previous section described concrete stream processing languages
belonging to several families. This section takes a cross-cutting view
and explores concepts that many of these languages have in common by
identifying the language design principles behind the concepts.
The views and opinions expressed herein are those of the authors and
are not meant as the final word. Explicitly articulating principles
demystifies the art of language design. We categorize language design
principles according to the three requirements from
Section~\ref{sec:introduction}, namely performance, generality, and
productivity.
The \textbf{performance} requirement is addressed by streaming
language design principles $\mathbf{P_1}$--$\mathbf{P_4}$:
\begin{itemize}
\item[$\mathbf{P_1}$] \emph{Windowing principle.} Windows turn streaming
data into static data suitable for optimized static computation.
For instance, in CQL, windows produce relations suitable for
classic relational algebra~\cite{arasu_babu_widom_2006},
optimizable via classic relational rewrite rules (see
Figure~\ref{fig:cqlops}).
\item[$\mathbf{P_2}$] \emph{Partitioning principle.} Key-based partitions
enable independent computation over disjoint state, thus
simplifying data parallelism. %~\cite{schneider_et_al_2015}.
For instance, MatchRegex performs complex event processing separately by
partition~\cite{hirzel_2012} (see Line~3 of Figure~\ref{fig:cep}).
Principles $P_1$ and $P_2$ also simplify advanced state management, e.g.,
in key-value stores for operator migration~\cite{gedik_et_al_2014}.
\item[$\mathbf{P_3}$] \emph{Stream graph principle.} Streaming
applications are graphs of operators that communicate almost
exclusively via streams, making them easy to place on different
cores or machines. This principle is central to the big-data
languages in Section~\ref{sec:big} such as
SPL~\cite{hirzel_schneider_gedik_2017} (see Figure~\ref{fig:spl}).
\item[$\mathbf{P_4}$] \emph{Restriction principle.} The schedules and
communication rates in a streaming application are restricted for
both performance and safety. For instance, Lustre can be compiled
to a simple imperative control loop without communication
buffers~\cite{lustre_1987} (see Section~\ref{sec:sdf}).
\end{itemize}
The \textbf{generality} requirement is addressed by streaming language
design principles $\mathbf{P_5}$--$\mathbf{P_8}$:
\begin{itemize}
\item[$\mathbf{P_5}$] \emph{Orthogonality principle.} Basic language
features are irredundant and work the same independently of how
they are composed. For instance, in CQL, relational-algebra
operators are orthogonal to windows~\cite{arasu_babu_widom_2006}
(see Section~\ref{sec:sql}).
\item[$\mathbf{P_6}$] \emph{No-built-ins principle.} The core language
remains slim and regular by enabling extensions in the library. For
instance, in SPL, relational operators are not built into the
language, but are user-defined in the library
instead~\cite{hirzel_schneider_gedik_2017} (see \mbox{Lines 3--8}
of Figure~\ref{fig:spl}).
\item[$\mathbf{P_7}$] \emph{Auto-update principle.} The syntax of
conventional non-streaming computation is overloaded to also
support reactive computation. For instance, ActiveSheets uses
conventional spreadsheet formulas, updating their output when
input cells change~\cite{vaziri_et_al_2014} (see
Figure~\ref{fig:activesheets}). The Lambda or Kappa
architectures~\cite{kreps_2014} take this to the extreme by
combining batch and streaming outside of the language.
\item[$\mathbf{P_8}$] \emph{General-feature principle.} Similar
special-case features are replaced by a single more-general
feature. For instance, operator parameters in
SPL~\cite{hirzel_schneider_gedik_2017} accept general
uninterpreted expressions, including predicates for the special
case of CEP~\cite{hirzel_2012} (see \mbox{Lines 4--7} of
Figure~\ref{fig:cep}).
\end{itemize}
The \textbf{productivity} requirement is addressed by streaming
language design principles $\mathbf{P_9}$--$\mathbf{P_{12}}$:
\begin{itemize}
\item[$\mathbf{P_9}$] \emph{Familiarity principle.} The syntax of
non-streaming features in streaming languages is the same as in
non-streaming languages. This makes the streaming language easier
to learn. For instance, CQL~\cite{arasu_widom_2004} adopts the
select-from-where syntax of SQL (see Figure~\ref{fig:cql}).
\item[$\mathbf{P_{10}}$] \emph{Conciseness principle.} The most concise
syntax is reserved for the most common tasks. This increases
productivity since there is less code to write and read. For
instance, regular expressions represent ``followed-by'' concisely
via juxtaposition \mbox{$e_1\,e_2$} (see Line~8 of
Figure~\ref{fig:cep}).
\item[$\mathbf{P_{11}}$] \emph{Regularity principle.} Data literals,
patterns that match them, and/or declarations all use similar
syntax. For instance, RSP-QL uses pattern syntax resembling
concrete RDF triples (see Line~10 of Figure~\ref{fig:rspql}).
\item[$\mathbf{P_{12}}$] \emph{Backward reference principle.} Code
direction is consistent with both scope and control dominance, for
readability. For example, Lustre declares variables before their
use (see Figure~\ref{fig:lustre}).
\end{itemize}
\begin{table}
\centerline{\small\begin{tabular}{@{}l@{ }c@{ }c@{ }c@{ }c@{\hspace*{2mm}}c@{ }c@{ }c@{ }c@{\hspace*{2mm}}c@{ }c@{ }c@{ }c@{}}
\toprule
\emph{Language} & \multicolumn{4}{@{}c}{\emph{Performance}} & \multicolumn{4}{@{}c}{\emph{Generality}} & \multicolumn{4}{@{}c}{\emph{Productivity}}\\
\midrule
CQL & P$_1$ & P$_2$ & P$_3$ & & $\mathbf{P_5}$ & & & P$_8$ & $\mathbf{P_9}$ & & & \\
Lustre & & & & P$_4$ & $\mathbf{P_5}$ & P$_6$ & P$_7$ & P$_8$ & $\mathbf{P_9}$ & P$_{10}$ & P$_{11}$ & P$_{12}$\\
SPL & P$_1$ & P$_2$ & P$_3$ & & $\mathbf{P_5}$ & P$_6$ & & P$_8$ & $\mathbf{P_9}$ & & P$_{11}$ & P$_{12}$\\
MatchRegex & & P$_2$ & & & $\mathbf{P_5}$ & P$_6$ & & P$_8$ & $\mathbf{P_9}$ & P$_{10}$ & & P$_{12}$\\
YFilter & & & & P$_4$ & $\mathbf{P_5}$ & P$_6$ & & & $\mathbf{P_9}$ & P$_{10}$ & & \\
RSP-QL & P$_1$ & & P$_3$ & & $\mathbf{P_5}$ & P$_6$ & & P$_8$ & $\mathbf{P_9}$ & P$_{10}$ & P$_{11}$ & \\
ActiveSheets & P$_1$ & P$_2$ & & P$_4$ & $\mathbf{P_5}$ & P$_6$ & P$_7$ & P$_8$ & $\mathbf{P_9}$ & P$_{10}$ & & \\
\bottomrule
\end{tabular}
}
\caption{\label{tab:principles}Which of the languages that served as
examples in Section~\ref{sec:languages} satisfy which of the
language design principles in Section~\ref{sec:principles}.}
\end{table}
\subsection{Principles Summary}
Good language design is driven by principles, but it is also an
exercise in prioritizing among these principles. For instance, CQL
satisfies P$_9$ (familiarity principle) by adopting SQL's syntax and
CQL violates P$_{12}$ (backward reference principle) by adopting SQL's
scoping rules. Table~\ref{tab:principles} summarizes principles by
language. Only two of the twelve principles (P$_5$ and P$_9$, shown
in bold) are uniformly covered, both related to the ease of use of the
language (separation of concerns and syntax familiarity). Although
some of the languages exhibit fewer principles,
Table~\ref{tab:principles} does not provide a comparative metric for
quantifying the coverage of each principle; such a metric would be
hard to agree upon. Satisfying more principles does not automatically
imply satisfying the associated requirement better. While
we formulated the principles from the perspective of streaming
languages, we do not claim to have invented them: many are well-known
from the design of other programming languages. For instance, the
orthogonality principle was a stated aim of the Algol~68 language
specification~\cite{vanwijngaarden_et_al_1975}. Now that we have seen
concepts that are \emph{present} in most streaming languages, the next
section will explore what is commonly \emph{missing or
underdeveloped}.