-
Notifications
You must be signed in to change notification settings - Fork 0
/
adoption.tex
89 lines (83 loc) · 4.69 KB
/
adoption.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
\subsection{Adoption}\label{sec:adoption} % Martin
Stream processing languages have an adoption problem. As
Section~\ref{sec:languages} illustrates, there are several families of
streaming languages comprising several members each. But no one
streaming language has been broadly adopted. The language family
receiving the most attention from large technology companies is
big-data streaming, including offerings by
Google~\cite{akidau_et_al_2013}, Microsoft~\cite{ali_et_al_2009},
IBM~\cite{hirzel_schneider_gedik_2017}, and
Twitter~\cite{toshniwal_et_al_2014}. However, they all differ.
Furthermore, in the pursuit of interoperability and expediency, most
big-data streaming languages are not stand-alone but embedded in a
host language. While being embedded gives a short-term boost to
language development, the entanglement with a host language makes it
hard to offer stable and clear semantics. And, if the history of
databases is any guide, such stable and clear semantics are useful for
agreeing on and consistently implementing a standard. Part of the
reason that the relational model for databases displaced its disparate
predecessors is its strong mathematical foundation. One of the
most-used languages mentioned in this survey is
Scade~\cite{scade_2017}, but it is designed for embedded systems and
not big-data streaming. Getting broad adoption for a big-data
streaming language remains an open challenge.
\textbf{Why is this important?}
%
Solving the adoption problem for stream processing languages would
yield many benefits. It would encourage students to build marketable
skills and give employers a sustainable hiring pipeline.
It would raise attention to streaming innovation, benefiting
researchers, and to streaming products, benefiting vendors.
If most systems adopted more-or-less
the same language, they would become easier to benchmark against each
other. Other popular programming languages, such as SQL, Java, and
JavaScript, flourished when companies competed against each other
to provide better implementations of the language. On the downside,
focusing on a single language would reduce the diversity of the
eco-system, transforming innovation and competition from being broad
to being deep. But overall, if the problem of streaming language
adoption were solved, we would expect streaming systems to become more
robust and faster.
\textbf{How can we measure the challenge?}
%
The streaming language adoption challenge can be broken down
into the following measures $\mathbf{C_7}$--$\mathbf{C_9}$:
\vspace*{-2mm}
\begin{itemize}
\item[$\mathbf{C_7}$] \emph{Widely-used implementation of one
language.} One language in the family has at least one
implementation that is widely used in practice, for instance,
Scade for SDF~\cite{scade_2017}.
\item[$\mathbf{C_8}$] \emph{Standard proposal or standard.} There
are serious efforts towards an official standard, for instance,
Jain et al.\ for StreamSQL~\cite{jain_et_al_2008} or
\textsc{Match-Recognize} for CEP~\cite{zemke_et_al_2007}.
\item[$\mathbf{C_9}$] \emph{Multiple implementations of same
language.} One language in the family has multiple more-or-less
compatible implementations, for instance,
Lustre~\cite{lustre_1987} and Scade~\cite{scade_2017} for SDF.
\end{itemize}
Language adoption is driven not just by the technical merits of the
language itself but also by external factors, such as industry support
or implementations that are open-source with open governance.
\textbf{Why is this difficult?}
%
Adoption is hard for any programming language, but
particularly so for a streaming language. While streaming in general
is not new~\cite{stephens_1997}, big-data streaming is a relatively
recent phenomenon. And big-data streaming, in turn, is driven by
several ongoing industry trends, including the internet of things,
cloud computing, and artificial intelligence (AI). Since all three of these
trends are themselves actively shifting, they provide an unstable
ecosystem for streaming languages to evolve. Furthermore,
innovation often takes place in a setting where data is assumed to be
at rest, as opposed to streaming, where data is in motion. For
instance, most AI algorithms work over a fixed training data set, so
additional research is necessary to make them work well online. When
it comes to streaming languages, there is not even a consensus on what
are the most important features to include. For instance, both the
veracity and the variety challenge discussed previously have given
rise to many feature ideas that have yet to make it into the
main-stream. Since people come to streaming research from different
perspectives, they sometimes do not even know each other's work,
inhibiting adoption. This survey aims to mitigate that problem.