-
Notifications
You must be signed in to change notification settings - Fork 0
/
2018-0606-response.txt
222 lines (120 loc) · 13.8 KB
/
2018-0606-response.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
Dear Editors and Reviewers for SIGMOD Record,
Thank you for deciding to accept our survey paper on "Stream Processing Languages in the Big Data Era" with major changes! We made a big push to turn it around quickly so that it could still be considered for the June issue. Below is an explanation of all the changes that have been made based on your comments.
Sincerely,
Martin (on behalf of the authors)
-----------------------------------------------------------
Section 1. Introduction
==> No issues raised in the reviews.
-----------------------------------------------------------
Introductory paragraph to Section 2. Stream Processing Languages
Review 2. It seems that w-questions are used to guide the categorization of Section 2. Not sure if each subsection is structured in this light though, seems that the w-questions are somehow lost.
==> While the w-questions did guide our writing, we were not pedantic about them. So we changed the introductory paragraph to Section 2 to elide them.
-----------------------------------------------------------
Section 2.1 Relational Streaming
Review 1 item 1. In this section, it may be useful to mention the tempo-relational query model variation from the early 90s [79], as a modification of CQL semantics where application time (and manipulations thereof) are a first-class citizen in the data and query model. An example system that extends this model across real-time and offline logs is Microsoft StreamInsight, as described in [80].
[79] Christian S. Jensen and Richard T. Snodgrass, "Temporal Specialization and Generalization," IEEE Transactions on Knowledge and Data Engineering 6(6), December 1994, pp. 954974.
[80] Badrish Chandramouli, Jonathan Goldstein, and Songyun Duan. Temporal Analytics on Big Data for Web Advertising. In ICDE, 2012.
==> The discussion of the temporal relations model has been added with the two related references.
Review 2. Section 2.1 is missing important references:
- the SECRET model from Dindar et al, which was a comprehensive follow-up to Jain et al [50]: Modeling the Execution Semantics of Stream Processing Engines with SECRET, VLDB Journal 22(4), 2013.
- StreamSQL from StreamBase, which, to my knowledge, was also based on [50] (see the TIBCO StreamSQL documentation).
==> The discussion of the reference has been added.
Review 2. Minor typos: time-sliding window -> time-based sliding window
==> Fixed.
-----------------------------------------------------------
Section 2.2 Synchronous Dataflow
==> No issues raised in the reviews.
-----------------------------------------------------------
Section 2.3 Big-Data Streaming
Review 1 item 2. A mention or description of Naiad [81] is missing in the paper - it uses a model called differential dataflow and a language based on extended LINQ [82]. It would be good to reference this work, perhaps in the context of Big Data streaming languages.
[81] D. G. Murray et al. Naiad: A Timely Dataflow System. In SOSP 2013.
[82] F. McSherry, D. G. Murray, R. Isaacs, and M. Isard. Differential dataflow. In CIDR, 2013.
==> A discussion of Naiad has has been added.
Review 1 item 3. I think Sec. 2.3 is the most important part of the paper and needs to be described in greater detail, perhaps even split into two subsections. The example of IBM SPL (maybe SPADE needs a mention as well?) is okay to start with as an early example. However, the large majority of streaming pipelines today use one of five languages:
(1) Spark Streaming and SparkSQL: Databricks [83]
(2) Apache Flink: data artisans [84]
(3) Heron/Storm: streaml.io [85]
(4) Trill-LINQ: Microsoft [86]
(5) Apache Beam: Google Cloud Dataflow [87, 88]
(6) Samza: LinkedIn (samza.apache.org)
Thus, it might make sense for the paper to use at least one, if not two, of these languages to show how the running example query would be authored, along with more details on the language style and semantics.
[83] https://spark.apache.org/streaming/
[84] Paris Carbone, Stephan Ewen, Seif Haridi, Asterios Katsifodimos, Volker Markl, Kostas Tzoumas. Apache Flink™: Stream and Batch Processing in a Single Engine. In IEEE Data Engg. Bulletin, Dec. 2015.
[85] Maosong Fu, Sailesh Mittal, Vikas Kedigehalli, Karthik Ramasamy, Michael Barry, Andrew Jorgensen, Christopher Kellogg, Neng Lu, Bill Graham, Jingwei Wu. Streaming@Twitter. In IEEE Data Engg. Bulletin, Dec. 2015.
[86] Badrish Chandramouli, Jonathan Goldstein, Mike Barnett, James F. Terwilliger. Trill: Engineering a Library for Diverse Analytics. In IEEE Data Engg. Bulletin, Dec. 2015.
[87] https://beam.apache.org/
[88] http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
==> Section 2.3 has been extended to cite and discuss MillWheel, Spark Streaming, Storm, Trill, Heron, Beam, and Flink. In some cases we picked earlier and more prominent papers to cite than the ones referenced by Review 1. While we managed to harvest some space elsewhere in the paper for this expansion of Section 2.3, it was not enough to add another example.
Review 1 item 4. Following up on above comment, the authors are encouraged to take a close look at the recent IEEE Data Engg. Bulletin to get more ideas for languages and references to include in the survey:
http://sites.computer.org/debull/A15dec/issue1.htm
==> That is a great collection and we cite the Flink paper from it.
Review 2. What about different programming models? Most mentioned systems follow DAG-based dataflow models with user-defined logic, like Storm's spouts and bolts. Some support iterations, some do not. What about Spark Streaming which supports built-in streaming operators along with a SQL layer added on top? Similarly KSQL for Kafka? Also, the last paragraph starts with a sentence that references Borealis, but missing its details. Aurora/Borealis took a boxes and arrows approach different from STREAM, Telegraph, etc. How does that fit here?
==> The discussion of Aurora/Borealis has been expanded. Furthermore, the discussion of programming models for various other big-data streaming systems has been expanded, c.f. the comments from Review 1.
Meta-review point 3. Since the title of the paper has emphasis on big data, and considering the wide-adoption and impact, it would be good to expand Section 2.3.
==> Section 2.3 has been expanded based on the reviewer comments.
-----------------------------------------------------------
Section 2.4 Complex Event Processing
Review 1 item 5. CEP is well covered but could benefit from citing other language models:
1. Augmented Finite Automaton (AFA) model [25] used in Trill
2. Flink CEP model [89]
3. Esper [90]
[89] https://flink.apache.org/news/2016/04/06/cep-monitoring.html
[90] http://www.espertech.com/esper/
==> We have added a short paragraph on the support of CEP within these recent streaming platforms.
-----------------------------------------------------------
Section 2.5 XML Streaming
Section 2.6 RDF Streaming
Section 2.7 Stream Reasoning
Review 1 item 6. Secs. 2.5 to 2.8 feel slightly out-of-place given their lack of deep adoption (for better or worse) in the big data ecosystem of today. If space is a constraint given the other suggestions in the reviews, these sections would be candidates for abbreviation.
==> Sections 2.5 to 2.8 have been compacted.
-----------------------------------------------------------
Section 3. Principles
Review 1 item 7. The principles section is nicely written and covers a lot of ground. A few suggestions on topics that seem to be missed out:
(a) Lambda vs. kappa architectures and unification of real-time and offline query languages
==> Added a kappa-architecture pointer and comment to principle P7.
(b) State management: Offloading cold state to resilient key-value stores (e.g., BigTable in Google Cloud Dataflow)
==> Added a key-value store pointer and comment to principle P2.
(c) Resiliency: Active-active and checkpoint-replay models, delivery semantics (exactly once, at least once, at most once)
==> Added a Spark Streaming pointer and comment to challenge measure C1.
(d) Application time vs. processing time
==> Added a Beam pointer and comment to challenge measure C2.
(e) Micro-batching, early output generation time, and the latency-throughput-completeness tradeoff
==> Added a Spark Streaming pointer and comment to challenge measure C2.
It will take some thinking to figure out how to place these topics in the context of the existing discussion, but doing so will broaden the principles significantly to take common practical concerns in the modern big data streaming analytics space into account. It may make sense to fold some of these topics into the Challenges section as well.
==> Agreed, we've added your topic suggestions, some to the principles and some to the challenges. Due to space contraints, we had to keep the the additions concise.
Review 2. I am not 100% convinced why the proposed classification, requirements, principles, etc. are the right ones. The paper lacks discussion that justifies how these choices were made. For example:
- Where do the three requirements come from? Could there be a fourth on that might be missing?
- Similar issues with the 12 principles. If many of these are indeed "well-known" from the PL community, please provide references and a discussion.
It seems to me that language issues can be analyzed from multiple perspectives including syntax, semantics, programming model, etc. I am not sure if the survey sufficiently analyzes all relevant aspects.
==> We added the following statement to the introductory paragraph of the principles section: "The views and opinions expressed herein are those of the authors and are not meant as the final word." Of course our discussion is likely incomplete, both because of the space limitations and because it reflects our personal experiences. In fact, Review 1 suggested additional topics, and we have revised the paper to incorporate them. We have also added a reference to orthogonality as an example of a well-known programming languages design principle.
Meta-review point 1. Elaborate further on the specific categories you use in the paper as requirements, principles, challenges, etc. to justify them better. For example, where do the three requirements, twelve language design principles, or the three challenges with the nine measures come from? Can there be more? If yes, could you please add the additional categories? If no, why? Refer to comments from both reviews to expand your categories.
==> Addressed, see comments above.
Review 2. Table 1 and 2: What are the key take aways from these tables? There is no discussion. I was left wondering "So?".
==> We thank the reviewer for this comment. We have now improved the readability of the two tables by improving vertical alignment and highlighting in bold the most frequent principles (per requirement) across all languages and the most occurring challenges over the language families. Thus, we let the reader understand what are the gaps that require more efforts from our community. We also added a discussions in Section 3.1 (a new subsection) and Section 4.4.
Meta-review point 2. Include a summary/take-aways subsection referring to Tables 1 and 2. Right now these tables are introduced without much discussion.
Meta-review point 4. Broaden the related work with the recommendations from both of the reviewers. For example, it would be good to discuss which language can be used as a starting point or a good example for different principles and challenges.
==> With the above improvements to the two tables, the reader can now see what is the language (language family, respectively) to choose in case they are looking for more (or less, respectively) coverage of the principles/challenges.
Review 3. Section 3, last paragraph, "many are well-known from the design of other programming languages": Any specific citations?
==> To give a concrete example, we have added a specific citation to the Algol 68 language specification, which was famously guided by the orthogonality principle.
-----------------------------------------------------------
Introductory paragraph to Section 4.0. What's Next?
Meta-review point 1. Elaborate further on the specific categories you use in the paper as [...] challenges, etc. to justify them better. For example, where do the [...] three challenges with the nine measures come from? Can there be more? If yes, could you please add the additional categories? If no, why? Refer to comments from both reviews to expand your categories.
==> An explanation has been added for why we focus on these challenges.
-----------------------------------------------------------
Section 4.1 Veracity
==> No issues raised in the reviews.
-----------------------------------------------------------
Section 4.2 Data Variety
==> No issues raised in the reviews. A paragraph containing more examples was removed and some part of the section was reworded to save some space.
-----------------------------------------------------------
Section 4.3 Adoption
==> No issues raised in the reviews.
-----------------------------------------------------------
Section 4.4 Challenges Summary
Review 2. Table 1 and 2: What are the key take aways from these tables? There is no discussion. I was left wondering "So?".
Meta-review point 2. Include a summary/take-aways subsection referring to Tables 1 and 2. Right now these tables are introduced without much discussion.
Meta-review point 4. Broaden the related work with the recommendations from both of the reviewers. For example, it would be good to discuss which language can be used as a starting point or a good example for different principles and challenges.
==> These comments pertain to both Section 3 and Section 4.4. We have addressed them in both places, see the corresponding response above.
-----------------------------------------------------------
Section 5. Conclusions
==> No issues raised in the reviews.