-
Notifications
You must be signed in to change notification settings - Fork 0
/
NSF-PIF2013.tex
978 lines (802 loc) · 43.9 KB
/
NSF-PIF2013.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
\documentclass[times,11pt]{article}
\usepackage[hmargin=1in,vmargin=1in]{geometry}
\usepackage[usenames]{color} %used for font color
\usepackage{amssymb} %maths
\usepackage{amsmath} %maths
\usepackage{graphicx}
\usepackage{hyperref}
\usepackage{xspace}
\newcommand{\GeV}{\ensuremath{\mathrm{GeV}}}
\newcommand{\TeV}{\ensuremath{\mathrm{TeV}}}
\newcommand{\GeVc}{\ensuremath{\mathrm{GeV}}}
\newcommand{\TeVc}{\ensuremath{\mathrm{TeV}}}
\newcommand{\GeVcc}{\ensuremath{\mathrm{GeV}}}
\newcommand{\TeVcc}{\ensuremath{\mathrm{TeV}}}
\newcommand{\pt} {\ensuremath{p_{\mathrm{T}}}\xspace}
\newcommand{\kt} {\ensuremath{k_{\mathrm{T}}}\xspace}
\newcommand{\antikt} {anti-\kt}
\newcommand{\ttbar} {\ensuremath{\mathrm{t}\overline{\mathrm{t}}}}
\newcommand{\bbbar} {\ensuremath{\mathrm{b}\overline{\mathrm{b}}}}
\newcommand{\qqbar} {\ensuremath{\mathrm{q}\overline{\mathrm{q}}}}
\newcommand{\fbinv} {\ensuremath{\mathrm{fb}^{-1}}}
\newcommand{\instlumiA} {\ensuremath{\times 10^{33} \,\mathrm{cm}^{-2} \mathrm{s}^{-1}}}
\newcommand{\instlumiB} {\ensuremath{\times 10^{34} \,\mathrm{cm}^{-2} \mathrm{s}^{-1}}}
\begin{document}
\begin{abstract}
\hrule
\begin{center}
{\bf \Large Clustering Jets at the Exascale}\\
Steven Ko, Salvatore Rappoccio, Lukasz Ziarek
\end{center}
\hrule
\bigskip
{\bf Overview :}
In the past, particle physics has relied upon
improvements in processing speed of single computing cores
in order to improve data acquisition rates. This feature will be
critical to the proposed ``High Luminosity Large Hadron Collider''
(HL-LHC) and other colliders, where the computational challenges move
from the petascale to the exascale. Unfortunately, since the mid-2000's,
this improvement
in single-core processing speed has hit a limit, and
improvements in processing time must now come from concurrent
processing. However, the current reconstruction techniques are not
enabled for concurrent processing. In order to continue
improving the processing
capabilities of particle physics experiments in the HL-LHC era, it will
be necessary to explore cutting-edge techniques in parallel processing.
There are several general classes of problems in particle physics
event reconstruction that could be modified in order to
achieve concurrent processing. One such opportunity that has not yet
been explored is in ``jet clustering,'' a nearest-neighbor type of
algorithm used to cluster hadronically-fragmented jets into a single
object.
\bigskip
{\bf Intellectual Merit :}
This proposal focuses on parallelizing the existing jet clustering
algorithms in use at the LHC experiments. The proposed improvements
will be to use this as a test case for deployment of cutting-edge
parallelization techniques such as lightweight concurrency extraction,
speculative computing, and smarter distribution. Some recent
experience shows that the nearest-neighbor type of algorithm used by
the jet clustering is amenable to such improvements.
\bigskip
{\bf Broader Impacts :}
The benefits of this proposal are twofold : firstly, there will be an
immediate improvement of the jet clustering algorithms themselves that
will lead to higher data acquisition rates at the LHC. Secondly, the
computing techniques developed could be used in other applications,
inside of particle physics and elsewhere. Since nearest-neighbor
algorithms are ubiquitous in scientific computing, it is expected
that techniques developed to parallelize this particular problem
will be applicable to a wide variety of others in academia and
industry.
In addition, these core developments can train students in the newest
computing techniques, giving them cutting-edge experience that is
highly relevant in academia and private industry.
\end{abstract}
\newpage
\pagenumbering{arabic}
\hrule
\begin{center}
{\bf \Large Clustering Jets at the Exascale}\\
Steven Ko, Salvatore Rappoccio, Lukasz Ziarek
\end{center}
\hrule
\bigskip
\section{Introduction}
The research proposed here will focus on deploying advanced techniques
in parallelization to improve algorithms involved in particle physics
research at the Large Hadron Collider (LHC) and beyond. As a
specific test case, this research focuses on immediate improvements to
one particular algorithm, jet
clustering, although the principles
developed could be deployed in other areas of particle physics
reconstruction and elsewhere.
With the discovery of the Higgs boson
by the Large Hadron Collider (LHC)
experiments ATLAS and CMS~\cite{higgs_cms,higgs_atlas}, the
standard model (SM) of particle
physics is now complete. This model unifies the electromagnetic force
(carried by the {\em photon}) with the weak force, responsible for
radioactive decay (carried by the {\em $W$ and $Z$ bosons}).
At long last,
physicists now understand that via interactions with the Higgs field,
the $W$ and $Z$ bosons acquire a mass, but the photon does not.
This is referred to as ``electroweak symmetry breaking''.
A new phase of particle physics has therefore begun.
The questions have shifted from the cause of electroweak symmetry
breaking, to the study of the Higgs boson and its interactions in
detail. To understand the larger picture of the fundamental forces in
nature, it will be imperative that high-luminosity colliders be built
to study the Higgs in detail, requiring exascale computing tools (in
speed and throughput) to reach the goals.
One of the major technical challenges that lies ahead in
exascale-level computing for these high-luminosity colliders is the
continuation of the scaling of computational power year by year, known
colloquially as ``Moore's Law''.
To set the scale, at the CMS experiment
with the LHC collision flux (``luminosity'') reaching
$7\instlumiA$, the processing
time to reconstruct each collision event by CMS was approximately 20
seconds per event. However, as the luminosity is
increased, the computational time currently scales
quadratically. The luminosity at the HL-LHC is expected to reach as
high as $>1\instlumiB$, which would correspond (naively) to enormous
processing times, on the order of many minutes to several
hours per event! Clearly, it is necessary for the
computing power to scale in order to compensate for this dramatic
increase in CPU time with increasing luminosity.
Unfortunately, the expected end of the historic scaling of single-core
processing capability~\cite{GAMEOVER} can adversely affect the
long-term processing capability for particle physics
experiments. Without improvements in single-core computational speed,
the only avenues left are algorithmic improvements, and
parallelization. Algorithm development is being undertaken at the LHC
experiments to
reduce the computational time of the software for data acquisition and
reconstruction, using both single-core and multi-core
approaches. However, long-term improvements will require fundamental
improvements in processing capabilities using parallelization.
The currently proposed exploratory research will deploy new and
innovative computer science techniques to particle physics.
One algorithm that is particularly amenable to improvement using these
techniques is the clustering of
final-state particles produced in collisions into groups called
``jets''. Jets are produced via quantum chromodynamic (QCD)
interactions of quarks and gluons that hadronize.
These ``jet clustering'' algorithms group the particles into jets
based on nearest-neighbor clustering (NN).
This algorithm can become computationally expensive as the number
of particles that are produced in a collision grows, scaling as
$N^2\log{N}$ or $N\log{N}$. Techniques that leverage and combine speculative computing,
lightweight concurrency, and smarter distribution can strongly impact
these types of NN algorithms~\cite{knn_gpu_1,knn_gpu_2,knn_gpu_3,knn-mapreduce-0,knn-mapreduce-1}.
Thus, this research will serve as a real-world test case
for the deployment of these algorithms, with both immediate and
long-term benefits.
%At CMS, this computation is done very often by individuals rather than
%centrally, and so is accounted for
%in the 40\% of CPU usage from ``mixed user analysis applications'' as
%described above.
%It is likely, therefore, that improvements observed
%in jet clustering will primarily benefit this portion of the CPU
%usage.
%We now discuss the prospects for
%utilizing parallelization in jet clustering in detail.
\section{Prior Work}
\subsection{Results from Prior Support}
All three PIs are junior faculty members at the University at Buffalo.
PI Rappoccio and Co-PI Ziarek joined the faculty in their
respective departments in August 2012.
\subsubsection{PI Rappoccio}
PI Rappoccio does not have NSF support yet as he has recently started his
faculty career.
\subsubsection{Co-PI Ziarek}
Co-PI Ziarek does not have NSF support yet as he has recently started his
faculty career.
\subsubsection{Co-PI Ko}
Co-PI Ko has been awarded ``CI-ADDO-NEW: PhoneLab: A Programmable Participatory
Smartphone Testbed'' on 06/01/12. The award number is CNS-1205656 and the amount
is \$1,358,510.00. The duration is for three years. The results include the
following.
\paragraph{Results Related to Intellectual Merit:} the PhoneLab team has
distributed 300 smartphones to the faculty, students, and staff of UB who are
using the smartphones as their primary phone. The team has developed the testbed
infrastructure where they can monitor the usage of each phone. The team has also
developed a front-end website where experimenters and participants can use for
various testbed-related functions. The team has opened the PhoneLab testbed for
public experimentation on 10/31/13. There are two public experiments running on
the testbed and 7 experiments are waiting for approval.
\paragraph{Results Related to Broader Impact:} the PhoneLab team is in the
process of recruiting 4 undergraduate students as part of their outreach
program. Two undergraduate students have been working with the team for one
month now. The team is actively interviewing undergraduate students to hire. In
addition, the team is in talks with Buffalo Academy of Science Charter School in
order to explore the possibility of establishing a program where the team
teaches high school students with smartphone programming skills.
\paragraph{Publications:} the PhoneLab team has published one paper so far,
``PhoneLab: A Large Programmable Smartphone Testbed'' in the First International
Workshop on Sensing and Big Data Mining, 2013.
\paragraph{Evidence of Research Products:} the PhoneLab testbed is currently in
use; there are roughly 300 participants and several experiments are either
running or waiting for approval. Experimenters can submit their experiments
through the website: \url{http://www.phone-lab.org}.
\subsection{Overview of Activities}
The investigators of this proposal have a
widely-varied and applicable skill set to accomplish the goals of
extending LHC computing to the exascale.
%\bigskip
%\noindent
%{\bf Particle Physics Computing}
%\bigskip
Salvatore Rappoccio (tenure-track Assistant Professor) joined the
Faculty at the University at Buffalo, SUNY (UB) in 2012. He has 15
years of experience programming in a
high-energy physics environment, as well as other numerical software
design for the private sector.
Since 2007, Rappoccio has been a member of the Compact Muon Solenoid
(CMS) experiment at CERN.
From 2008-2010, he was the co-leader of the Analysis Tools group of
the CMS Software Project. He was responsible for deployment of jet
reconstruction algorithms as well as other tools for data analysis of
LHC collision data during the startup phase of the LHC. His primary
responsibilities included managing deployment of highly-performing
software (including visualization) tools in a team consisting of
$\sim$15 people. He has been instrumental in improving the single-core
computational speed for jet clustering at CMS since 2008.
His research interests include utilizing jet clustering algorithms in
new and innovative ways to search for signals of physics beyond the
standard model of particle physics. His pioneering efforts resulted in
the first measurements and searches for new phenomena with advanced jet clustering
algorithms at CMS, outlined in
Refs.~\cite{EXO-11-006,EXO-11-095,SMP-12-019}. These
techniques have become hugely popular in the LHC experiments and in
the theoretical community. Prof. Rappoccio is a leader in the jet
reconstruction community, contributing to and editing seminal reports
on the subject in Refs.~\cite{boost2010,boost2011,boost2012}.
In addition to this academic work, he has also been involved in
numerical software design for MIT's Lincoln Laboratories (the details
of which are classified).
Currently, Rappoccio is mentoring two students (one graduate student,
Jaba Chelidze, and one undergraduate student, Jonathan Goodrum) in
achieving parallelization of the jet clustering algorithms. The latter
is working under the ``Collegiate Science and Technology Entry
Program'' (CSTEP)~\cite{cstep}. As described by the program, CSTEP
exists ``to support talented underrepresented students pursuing
science, technology, engineering and mathematics''.
Lukasz Ziarek has 9 years of experience in language, compiler,
and runtime design targeted at improving multicore performance. He
has worked on 5 compilers and 3 Java VMs. He is an expert at
speculative and transactional computation focusing on the extraction
of parallelism and lightweight concurrency.
Steven Ko has 10 years of experience in distributed systems. His
recent focus has been large-scale data processing in the cloud using
MapReduce and other technologies built on top of it. He also has 5
years of experience in large-scale storage and data management in
data centers.
\section{Proposed Program of Research}
The
``jet clustering'' technique is employed by
many different particle physics experiments and theorists worldwide, and is
implemented in a common software framework called
{\tt fastjet}~\cite{fastjet_manual}.
The mathematical problem is analogous to the
``K-nearest neighbors algorithm''~\cite{knn_ieee} (kNN).
The specific jet-clustering algorithms that have become enormously
popular in the particle physics community are based on sequential
clustering, similar to nearest-neighbor
clustering~\cite{fastjet_timing}. In these sequential-clustering
algorithms, a list of the four-momentum of particles are input.
These algorithms combine the input four-vectors pairwise
until certain criteria are satisfied, defining the resultant
combination as a jet.
For the jet algorithms considered in this proposal, for each pair of particles $i$ and $j$,
a ``distance'' metric between
the two particles ($d_{ij}$), and the so-called ``beam distance''
for each particle ($d_{iB}$), are computed:
\begin{eqnarray}
\label{eq:dij}
d_{ij} &=& \mathrm{min}({\pt}_i^{2n},{\pt}_j^{2n}) \Delta R_{ij}^2 / R^2 \\
\label{eq:diB}
d_{iB} &=& {\pt}_i^{2n},
\end{eqnarray}
where ${\pt}_i$ and ${\pt}_j$ are the transverse momenta of particles
$i$ and $j$, respectively, ``min'' refers
to the lesser of the two $\pt$ values,
the integer $n$ depends on the specific jet algorithm, $\Delta R_{ij} = \sqrt{(\Delta y_{ij})^2 + (\Delta\phi_{ij})^2 }$
is the distance between $i$ and $j$ in
rapidity ($y = \frac{1}{2} \ln (E + p_{z})/(E - p_{z})$) and azimuth ($\phi$),
and $R$ is the ``size'' parameter of order unity~\cite{ktalg}, with all angles expressed in radians.
The particle pair $(i,j)$ with smallest $d_{ij}$ is combined into a
single object. All distances are recalculated using the new object, and the procedure is
repeated until, for a given object $i$, all the $d_{ij}$ are greater
than $d_{iB}$. Object $i$ is then
classified as a jet and not considered further in the algorithm. The process is
repeated until all input particles are clustered into jets.
The value for $n$ in Eqs.~(\ref{eq:dij}) and~(\ref{eq:diB}) governs
the topological properties of the jets.
For $n=1$ the procedure is referred to as the
$k_{\mathrm{T}}$ algorithm (KT). The KT jets tend to have irregular
shapes and are especially useful for reconstructing jets of lower momentum~\cite{ktalg}.
%For this reason, they are also sensitive to the presence of
%low-$\pt$ pileup (PU) contributions, and are
%used to compute the median $\pt$ per unit area (in $(y,\phi)$) of an
%event~\cite{jetarea_fastjet}.
For $n=-1$, the procedure
is called the anti-$k_{\mathrm{T}}$ (AK) algorithm, with features
close to an idealized cone algorithm.
%% when the input particles have
%% negligible mass, in which case the rapidity $y$ is approximated by the
%% pseudorapidity $\eta$.
The AK algorithm is used extensively
in LHC experiments and by the theoretical community for
finding well-separated jets~\cite{ktalg}. For $n=0$, the procedure
is called the Cambridge--Aachen (CA) algorithm. This relies only on angular
information, and, like the $k_\mathrm{T}$ algorithm,
provides irregularly-shaped jets in $(y,\phi)$. The CA algorithm is useful in identifying
jet substructure~\cite{CAcambridge,CAaachen}.
The single-core optimization of
jet clustering
is outlined in Ref.~\cite{fastjet_timing}. In a single core,
the computational time scales as $O(N^2)$ or $O(N \ln{N})$, where $N$
is the number of inputs to the algorithm, which scales linearly with
luminosity. Since the luminosity of future colliders is expected to
drastically increase over existing machines, it will be critical to
develop parallelization strategies to maintain scalability of the jet
clustering algorithms that exist to future machines.
Since the particles are clustered pairwise, there are numerous
opportunities to separately compute portions of the event and then
combine them at later stages.
We now
discuss specific strategies that can be developed
to optimize concurrent performance in this algorithm.
\iffalse
There is existing work and literature on the topic of the
parallelization of similar kNN algorithms, for instance, in
Refs.~\cite{knn_gpu_1, knn_gpu_2, knn_gpu_3}, where improvements
$O(100)$ in CPU performance are observed over standard
algorithms. Since the proposed use case is
very similar to the kNN algorithm, similar improvements to the
processing time by parallelization strategies are expected.
\fi
\subsection{Technical Challenges}
To achieve the necessary improvements in performance required for scalability
of jet clustering, we propose to examine parallelization opportunities across
the entire software stack, including three specific areas :
(1) the use of lightweight concurrency
extraction to mask high-latency computations or I/O actions,
(2) extraction of
parallelization from the computation itself in the form of optimistic speculation
and specialized transform, and
(3) new methods for distributing the computation to
maximize parallelization on each node.
\bigskip
\noindent
{\bf Lightweight Concurrency for Latency Masking}
\bigskip
Many mathematical kernels contain opportunities for extracting ``micro parallelism,''
usually on the order of tens of instructions, from their computational components.
Unfortunately, it is very difficult to parallelize this computation profitably as
the overhead of thread creation, scheduling, synchronization, and migration outweigh
the gains in parallelism. Instead of extracting explicit parallelism from such
computations, we propose to explore methods of lightweight asynchrony to allow for
computation to proceed while waiting on high latency I/O operations to complete or on
the results of other computations. Since the creation of threads and associated
schedule and synchronization costs are typically prohibitive, we will explore new
threading models that allow for logically-distinct computations to execute within
a given construct. Co-PI Ziarek's previous research has indicated that such schemes can profitably
boost overall performance in the context of ML code~\cite{acml, parasites}.
The salient research
challenges in applying this strategy are as follows: 1) identifying what computation can be executed
safely during high latency operations at compile time, 2) providing a lightweight threading runtime
and programming model in the context of an imperative language,
3) specializing the approach to numeric kernels,
and 4) building support for computation in a distributed setting.
\bigskip
\noindent
{\bf Speculative Computation}
\bigskip
In addition to exploring explicit parallelization of the numeric kernels in jet clustering, we
propose to explore extraction of parallelism via speculative computation. At its core, speculative
computation breaks apart sequential or parallel tasks into smaller tasks to be run in parallel. Once
the speculation has completed, the runtime system validates the computation. If the computation is
incorrect ({\em i.e.} a ``datarace'' is detected, the computation cannot be serialized, {\em etc.}), the
incorrect computation is re-executed in a non-speculative manner. If the rate of mis-speculation
is low, such techniques can be leveraged to extract additional
parallelism.
There have been many different
proposals, including large efforts on transactional memory~\cite{Gramoli:2012:TCS:2168836.2168872,Bocchino:2008:STM:1345206.1345242,Adl-Tabatabai:2006:CRS:1133981.1133985,Usui:2010:ALC:1850844.1850945},
lock elision~\cite{Roy:2009:RSS:1519065.1519094}, thread level speculation and speculative
multithreading~\cite{Barreto:2012:UTS:2442626.2442639,Johnson:2004:MPD:996841.996851},
for integrating speculative computation into programming languages
and their associated runtimes.
The PIs have extensive experience with transactional memory~\cite{trans}, lightweight rollback methods~\cite{stab},
leveraging memoization to reduce re-computation costs~\cite{memo1, memo2},
and deterministic speculation~\cite{iso}. We propose to explore a specialized speculation framework leveraging different
speculation strategies, including speculation extracted by the programmer via programming language primitives,
library level speculation, and compiler extracted speculation.
The salient research
challenges in applying this strategy are as follows: 1)
identification of the appropriate speculation model and
discovering speculation points at compile time, 2) providing a speculative runtime specialized for jet clustering
and capable of realizing user, library, and compiler injected speculation, and 3) exploring new and specialized
lightweight validation and re-execution mechanisms, including validation across multiple speculation strategies.
\bigskip
\noindent
{\bf Smart Distribution}
\bigskip
In order to increase parallelism, we will explore the use of the MapReduce
execution framework~\cite{mapreduce-osdi, mapreduce-hadoop}. MapReduce is a
runtime system recently developed for large-scale parallel data processing. It
enables programmers to easily deploy their applications on a cluster of
machines. Programmers only need to write two functions, Map and Reduce, and
submit these two functions as a job to the system. Then the MapReduce framework
takes care of all the aspects of the execution of the job. For example, the
framework packages and distributes the two functions over the cluster so that
the whole cluster can be utilized to execute the job; it also takes care of
fault-tolerance by monitoring the cluster during the execution of the job and
redistributes the job if some machine fails.
Due to this simplicity and power, it is quickly gaining popularity in industry
for large-scale data processing. Many applications in scientific computing have not
yet explored the use of MapReduce in depth, however
previous research has explored implementing similar kNN-style
algorithms with
MapReduce~\cite{knn-mapreduce-0, knn-mapreduce-1}. We intend to explore this
question in the context of jet clustering for the LHC.
\subsection{Research Strategy}
Preliminary studies of naive parallelization of a few specific
routines in the {\tt fastjet} package reduced processing times by
factors of 2-3 by moving to concurrent processing with 8 cores, with
no increase in memory cost. By further deploying the advanced
strategies described above, this factor of 2-3 improvement is expected
to grow significantly.
These preliminary studies were performed by Rappoccio and Mr.
Goodrum (undergrad student) over the summer of 2013. Having a
postdoctoral fellow to work on this project will greatly increase the
likelihood of success.
The postdoctoral fellow requested in this proposal is envisioned to
have a computer science background. There will be two graduate
students, one with a physics background (but with strong computational
skills), and one with a computer science background. Furthermore, it
is expected that Mr. Goodrum will continue to work with the group
under his CSTEP Fellowship. This team, under
the guidance of Rappoccio, Ziarek, and Ko, will perform the
implementation and study of these algorithms in the real-world
{\tt fastjet} software environment, testing the improvements at the
extensive cluster at the {\bf Center for Computational Research} here
at UB.
During the first year, it is expected that naive improvements using
established techniques could be deployed into the {\tt fastjet}
package. Deployment of the improvements achieved by Mr. Goodrum and
Rappoccio should take several months to a year of full deployment
time. This will serve as a baseline for the advanced algorithmic
improvements that will be in research and development simultaneously.
The postdoctoral fellow, under the tutelage of Profs. Ziarek, Ko, and
Rappoccio, will use this baseline to investigate the impacts of speculative
computation and lightweight concurrency. In the first half of the first year,
we will first start with the task of exploring lightweight concurrency
techniques as this is applicable in multiple settings, including single machines
and clusters. Two graduate students will contribute to
this effort. During this period, they will gain more experience with the kNN
algorithm and also with the existing code base.
Starting from the second half of the first year, the PIs and the postdoctoral
fellow will move on to the task of investigating speculative computation, while
two graduate students finalize and polish the implementation of lightweight
concurrency techniques. Due to the number of options we can investigate and the
complexity of the implementation of the techniques, we anticipate that this task
will be more involved in terms of the initial investigation and overall
development. Thus, we will start this task from the second half of the first
year and continue throughout the second year. Based on prior results of
the PIs and the community, we expect to gain scalability increases of
2x to 8x over the naive parallel algorithm with these techniques. Additional
improvement in throughput is expected by masking high latency I/O operations and network
communication.
We will spend the final year mostly with smart distribution strategies and
integration of the techniques we develop in the first two years. Due to its
unique programming model, MapReduce brings its own complexity and the
postdoctoral fellow and the two graduate students will spend the beginning of
the third year to learn how to program in MapReduce. After that, we will
investigate how to adapt our kNN algorithm with MapReduce. Table~\ref{timeline}
shows the proposed project timeline. We expect to get additional overall
system throughput for the kNN algorithm and improved scalability. Based on
prior results we expect to see improvements of 2x to 4x. Additional
improvements can be expected as the cluster size increases.
\begin{figure*}[!tp]
\centering
\includegraphics[width=0.80\textwidth]{schedule.pdf}
\caption{Project Timeline}
\label{timeline}
\end{figure*}
\section{Broader impacts}
The broader impacts of this research are manyfold. The jet-clustering
algorithm is very similar to the ``kNN'' algorithm, which is widely
applicable throughout research and industry. The improvements that are
developed here at the cutting edge of scientific inquiry will possibly be
adaptable to other real-life applications. Examples of applications
outside of particle physics would be statistical
software~\cite{statsoft}, real-time 3-d rendering of computer
graphics~\cite{opengl}, and feature extraction~\cite{featureextract}.
In addition to the core physics developments enabled by improved
scalability of the {\tt fastjet} software environment, this award will
also serve as a validation mechanism for core computer science
research in programming languages and distributed systems. The development
of the system will require systems research in expanding and applying
the technological advances made previously by the PIs.
\section{Education and Outreach}
\subsection{Education}
All three PIs teach graduate courses at UB and will teach undergraduate courses
during the duration of this project. In this section we outline planned activities
and how they relate to the proposal.
\subsubsection{Curriculum Development}
{\bf Undergraduate (UG) level.}
Co-PI Ko teaches undergraduate {\bf Networking} and {\bf Systems} courses. Co-PI Ziarek will
teach undergraduate {\bf Compilers} and {\bf Programming Languages}
courses.
PI Rappoccio teaches undergraduate
{\bf Introductory Computational Physics} and {\bf Introductory Physics}.
All three PIs plan on creating a cross-listed Physics/Computer Science course
focusing on advanced and modern techniques in scientific computing
beyond the traditional computational physics classes. The courses will be geared toward
advanced undergraduates and graduates in Physics and Computer Science.
Software developed under this proposal will be leveraged in the
classroom to exemplify practical aspects of the work.
{\bf Graduate (G) level.}
Co-PI Ko teaches graduate {\bf Distributed Systems} and {\bf Android} courses. Co-PI Ziarek will
teach graduate {\bf Programming Languages} courses. PI Rappoccio
teaches a graduate version of {\bf Introductory Computational Physics}.
The PIs plan on teaching a graduate version
of their proposed undergraduate course, focusing on advanced techniques and implementation
details at the systems level. Students who successfully complete the course will be well suited to
begin research at the intersection of particle physics and computer science. To help support
students interested in this advanced course, two three week winter session courses will be offered:
one for computer science students to teach the necessary physics background and one for physics
students to teach systems and distributed programming. The proposed course will be taught
in the spring semester after the winter session courses.
\subsubsection{Undergraduate research and Diversity}
PI Rappoccio works closely with
undergraduates, including Mr. Goodrum. Mr. Goodrum is participating in the
``Collegiate Science and Technology Entry Program'' (CSTEP) here at
UB while working on this project. This program focuses on
disadvantaged or minority students who would otherwise have a
difficult time pursuing STEM-related fields. Prof. Rappoccio is a
strong believer in the principles and practice of this program, and
intends to continue this work in the future.
There are also numerous opportunities for Independent Study and
Honors' Theses for undergraduates in the Physics department.
Co-PI Ko is actively involved in mentoring undergraduates. Co-PI Ko is involved
in several formal programs such as the McNair Program and UB Honors College
Program. Through these programs, two undergraduate students, Edward Poon and
Mitchell Nguyen, are currently conducting research with him. Edward Poon, a
junior, participates in the McNair Program, which is ``designed to provide
encouragement and services to low-income and first generation college students,
and increase participation from underrepresented groups in pursuing doctoral
study.''~\footnote{\url{http://cads.buffalo.edu/mcnair}} Co-PI Ko is currently
listed as a McNair mentor. The other student, Mitchell Nguyen, is a freshman
Honors College student. Mitch was involved in Co-PI Ko's research and
hopes to continue his research involvement.
Co-PI Ko also has a track record of recruiting women students. Co-PI Ko has
successfully recruited two female PhD students, Sonali Batra and Anudipa
Maiti. Co-PIs Ko and Ziarek have also been working with a female Master's
student, Namita Vishnubhotla, and have successfully recruited her as a PhD
student; she will start as a PhD student in Spring, 2014. All PIs will
actively continue to recruit female and minority students in their research
programs.
\subsection{Outreach}
While it is critical to pursue a rigorous research program, a large
part of the responsibility of scientists is to educate the next
generation effectively. There is already extensive work being done to
educate high school-level students and teachers about particle physics
via the {\em QuarkNet}
program at UB, however there is
very little in the way of educating the general
public.
In addition to participating in the existing {\em QuarkNet}
activities, the plan outlined in this proposal will extend the coverage of the
outreach program at UB to engage the broader
public in discussions of major results in particle
physics, as well as to enliven particle physics for young students.
This will be implemented based on similar events as the
``HiggsFest''~\cite{higgsfest} that PI Rappoccio organized at UB.
In addition, outreach directly to high school environments will be
undertaken based on the Computer Science Teachers Association (CSTA)
work done by Co-PI Ziarek.
\subsubsection{Higgsfest and other public events}
The ``Higgsfest'' that was organized here at UB in 2012
is highlighted in Ref.~\cite{higgsfest}. The aim of the event was to
invite the general public for ``plain English'' summaries and hands-on
demonstrations that were geared for a multitude of age and knowledge
levels.
This was attended by over 100 people, including children,
high-school students, physics and non-physics undergraduates, and
interested members of the community.
Some of the hands-on demonstrations included building models of
Feynman diagrams from craft material (for young children), a
fully-functional four-layer coincidental muon scintillator detector,
a cloud chamber made out of tupperware, felt, and dry ice, and the
actual Higgs events from the CMS collaboration in an interactive event
display.
The event was covered by the ``UB Reporter'' here at
UB~\cite{higgsfest_ubreporter}.
Two more such events are proposed, the first to coincide with the LHC
turn-on sometime in 2015, and the second to coincide with the newest
results from the LHC after data-taking commences. These are
events that should generate high media coverage, and will be a good
opportunity to capitalize on public interest in this field.
In the event of a major new discovery at the LHC
during Run 2, the public interest will be very high, so having the
experience of what works and what does not work in such events is
extremely valuable to maximize the public impact.
In addition, this removes the stigma associated with science and
technology fields at an early age. When young children can attend an
event with their parents and take something away from it, this shows
them that science is an integral part of life, and nothing to be
particularly nervous about pursuing. It may even convince younger
people to pursue a scientific career.
One of the major points learned during the last ``Higgsfest''
is that it is often difficult to have economically-disadvantaged
students attend the lectures because of a lack of transportation
possibilities. This is something to rectify for future
projects along these lines. Therefore, in addition to holding the
event directly
at the UB North Campus (which is difficult for inner-city
Buffalo schools to reach), a duplicated event is also proposed
closer to the inner city that is easier to attend, or possibly to
visit these schools directly. Some possible
locations are the UB South Campus, or at
``Babeville''~\cite{babeville}, where the UB Physics
Department routinely organizes the ``Science and Art
Cabaret''~\cite{cabaret}. Both locations
provide the infrastructure needed for the event, and
access for disadvantaged schools and students in the inner city of
Buffalo.
\subsubsection{High School Outreach}
Co-PI~Ziarek is heavily involved with the local and national chapters of the Computer Science Teachers Association (CSTA), which
focus on outreach to local high school teachers. He has helped organize the annual local conference for high school teachers in
2012 and 2013, he has developed and presented a half day Python workshop that shows teachers how to leverage Python and
robotics in the classroom, and has been an invited speaker at the CS4HS program offered at Buffalo State University.
PI~Ziarek expects to continue to offer
material for use at the high school level and plans to teach an age appropriate workshop on scientific computing at the next
CSTA conference. One goal of the CSTA is to attract underrepresented minorities to study computer science through
exposure and education at the high school level.
The PIs expects to leverage the software developed under this proposal to offer age appropriate material to HS teachers
through the use of a Python interface. Specifically,
introductory programming modules along with associated lessons plans to integrate either individual units or entire
curriculums into the classroom.
\section{Summary}
In summary, the problem of expanding LHC computing to the exascale is
a difficult, but tractable one. This proposal investigates the
possibility of applying cutting-edge parallelization techniques such
as lightweight concurrency extraction, speculative computation, and
smarter distribution, to the
real-world application of LHC data processing.
The overall goal is to reduce the computational time for
$k$-nearest-neighbor-like numerical kernels used for jet
clustering. The investigators of this
proposal have extensive experience in the various aspects of the
problem, and the synergistic application of this experience is
expected to attain considerable improvements in this area, which
are absolutely critical to the success of the future LHC physics
program.
\newpage
\pagenumbering{arabic}
\renewcommand{\thepage} {A--\arabic{page}}
\bibliographystyle{unsrt}
\bibliography{NSF-PIF2013}{}
%\bibliography{auto_generated}
\newpage
\pagenumbering{arabic}
\renewcommand{\thepage} {B--\arabic{page}}
\noindent{\Large \bf BUDGET JUSTIFICATION}
\bigskip
{\bf Institution : } The State University of New York at Buffalo (UB)
{\bf PI : } Salvatore Rappoccio
{\bf Co-I : } Lukasz Ziarek, Steven Ko
\bigskip
{\bf Personnel}
\bigskip
The requested funds of \$951,102 USD (for 3 years starting in 2014)
would cover two (2) months of summer salary for Profs. Rappoccio,
Ziarek and Ko per
year for three (3) years
(\$172,062), the full salary for
one (1) postdoctoral fellow for three (3)
years (\$154,224), and salary plus
tuition for two (2) graduate students, totaling \$91,812 in
salary and \$33,432 in tuition. This will also cover \$22,300 of
computer fees.
If awarded, efforts for Profs. Rappoccio, Ziarek and Ko will be
reduced to be in compliance with NSF 2 month policy.
\bigskip
{\bf Fringe}
\bigskip
Fringe benefit rates are based on the applicable federally negotiated rates published at
\\
http://www.research.buffalo.edu/sps/about/rates.cfm.
\bigskip
{\bf Travel}
\bigskip
This research will require regular travel to conferences and
CERN. Hence, the proposal requests
\$12,486 in domestic travel funds and \$12,486 in foreign travel funds
for the three years of activity.
\bigskip
{\bf Facilities and Administration Indirect Costs}
\bigskip
Indirect cost rates are based on the applicable federally negotiated
rates published at
\\
http://www.research.buffalo.edu/sps/about/rates.cfm
\newpage
\pagenumbering{arabic}
\renewcommand{\thepage} {C--\arabic{page}}
\noindent{\Large \bf Facilities, Equipment and Other Resources}
\bigskip
{\bf Center For Computational Research}
UB has a large computational research center, CCR, which is a
Linux-based cluster on the Open Science Grid, and also has a large GPU
cluster for possible parallel processing developments.
\bigskip
{\bf Office}
The faculty, postdoctoral fellows and graduate students all have
office space at UB.
\newpage
\pagenumbering{arabic}
\renewcommand{\thepage} {D--\arabic{page}}
\noindent{\Large \bf Postdoctoral Fellow Mentoring}
One postdoctoral fellow will be funded on this project. There are
extensive postdoctoral fellowship mentoring activities at UB, as well
as via the CMS
Experiment at CERN. These include guidance in career paths,
work/life balance discussions, and technical skill development such as
writing grant proposals, etc. Specific elements are highlighted
below.
\begin{itemize}
\item {\bf University at Buffalo (UB)}
\begin{itemize}
\item The UB Office of Postdoctoral Scholars offers diverse services
for postdoctoral fellows, including the {\sl ``Postdoc Survival
Skills Workshops''}, targeted seminars and symposia for
postdoctoral fellows, social functions, and logistical assistance.
\item The UB Physics Department offers several services to our
postdoctoral fellows, including a biweekly Journal Club for particle
physics and cosmology, weekly seminars and colloquia, and weekly
social functions inside the department.
\end{itemize}
\item {\bf CERN}
\begin{itemize}
\item The opportunities for a postdoctoral fellow at CERN
are extensive. There are also a plethora of workshops,
seminars, conferences, etc, at CERN. There are also smaller weekly
avenues for networking possibilities, as well as seminars for
postdoctoral fellows to gain visibility for their work.
\item It is also worth pointing out that, because of the world-class
nature of CERN, it often attracts very high-level members of the
particle physics community on a regular basis. Such opportunities
for visibility among the top-tier scientists in the world (including
Nobel and Milner Prize winners, etc) are hard to understate.
\end{itemize}
\end{itemize}
In all, the postdoctoral fellow that will be supported by this
proposal will have ample opportunities for professional advancement
and development, as well as a myriad of opportunities for a community
of peers in both professional and social settings.
\newpage
\pagenumbering{arabic}
\renewcommand{\thepage} {E--\arabic{page}}
\noindent{\Large \bf Data Management Plan }
The data that are produced from this proposal will be in the form of
software algorithms and procedures. The format and content will be as
source files (typically {\tt C++}). There will be no data nor metadata
collected.
The algorithms that are developed with this research plan will be
integrated into the main {\tt fastjet} software framework for
distribution among collaborators (both experimental and theoretical)
worldwide (\url{http://fastjet.fr}). This software is open-source,
freely available, and continually maintained by the {\tt fastjet}
maintenance team under the GNU Public License V2~\cite{gnupl}.
The web pages related to this project are stored on
machines at the ``Laboratoire de Physique Théorique et Hautes Energies''
(\url{http://www.lpthe.jussieu.fr/spip/?lang=en}).
The {\tt fastjet} team has indicated their willingness to integrate
improvements into the main {\tt fastjet} package for worldwide
distribution.
\end{document}
% LocalWords: Exascale LHC Ko Rappoccio Lukasz Zialek Ziarek EIC PIs
% LocalWords: Collider exascale transformative CMS multicore VMs bla
% LocalWords: luminosities HEP HTC HPC hadrons hadronic fastjet NN
% LocalWords: Ref kNN Refs knn gpu factorized workflow MapReduce mis
% LocalWords: datarace multithreading runtimes memoization HL Higgs
% LocalWords: colliders petascale hadronically SM bosons QCD quarks
% LocalWords: chromodynamic gluons hadronize mapreduce SUNY UB CERN
% LocalWords: co QuarkNet HiggsFest Higgsfest Feynman scintillator
% LocalWords: tupperware Babeville Goodrum CSTEP Facilitator IL NY
% LocalWords: Fermilab Oct UEC Delmar collider Duckon Naperville Jun
% LocalWords: CH JHU oversaw Guofan Hu Marc Osherson Yongjie Xin BSM
% LocalWords: Bjergaard Prateek Bajaj PIF USD Profs CCR Postdoc CI
% LocalWords: Milner ADDO PhoneLab Smartphone Testbed CNS testbed KT
% LocalWords: smartphones website smartphone Jaba Chelidze min Eqs
% LocalWords: pileup PU AK pseudorapidity Aachen CA Ziarek's UG Poon
% LocalWords: gazillion bajillion manyfold Android McNair Nguyen HS
% LocalWords: Ko's Sonali Batra Anudipa Maiti Namita Vishnubhotla de
% LocalWords: CSTA metadata Laboratoire Théorique Hautes timeline