-
Notifications
You must be signed in to change notification settings - Fork 0
/
rdf.tex
167 lines (155 loc) · 7.83 KB
/
rdf.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
\subsection{RDF Streaming}\label{sec:rdf} % Emanuele
In 2009, Della Valle et al.\ called the semantic web and AI community
to investigate how to represent, manage, and reason on heterogeneous
data streams in the presence of expressive domain
models (captured by ontologies)~\cite{DBLP:journals/expert/ValleCHF09}.
Those communities were still focusing on static knowledge bases, and
solutions to incorporate changes were too complex to apply to big data
streams. Della Valle et al.\ proposed to name this new research area
\emph{stream reasoning}~\cite{DellAglioDataScience2017}, and
the sub-area focused on the semantic web \emph{RDF
Stream Processing} (RSP)~\cite{DBLP:conf/debs/ValleDM16}. This
section presents RSP, while Section~\ref{sec:sr} elaborates on stream
reasoning.
%The information model of RSP (i.e.,RDF stream) can accommodate diverse data elements. This makes it easier
%to handle data variety than in streaming languages with static
%schemas, such as in relational streaming. Graphs are particularly
%useful when analyzing social media, where each micro-post is a small
%semi-structured tree, but a collection of micro-posts forms a dynamic
%conversation graph that overlays the social
%graph~\cite{DBLP:conf/semweb/BalduiniVDTPC13}. Graphs are also useful for
%interpreting streams of text or
%videos~\cite{DBLP:journals/semweb/GangemiPRNDM17} by emitting a stream
%describing the sentences or scenes.
\sloppy RSP research extended the
semantic web stack~\cite{DBLP:books/daglib/0036180} to represent
heterogeneous streams, continuous queries, and continuous
reasoning. Inspired by CQL~\cite{arasu_widom_2004}, Della Valle et
al.\ proposed Continuous SPARQL
(\textsf{C-SPARQL},~\cite{DBLP:conf/fis/ValleCBBC08}), inspiring multiple
extensions~\cite{DBLP:journals/semweb/AnicicRFS12,Calbimonte2010,LePhuoc2012c}.
In 2013, a W3C community
group\footnote{\url{http://www.w3.org/community/rsp/}} was established
to define \textsf{RSP-QL} syntax~\cite{DBLP:conf/esws/DellAglioCVC15}
and semantics~\cite{DBLP:journals/ijswis/DellAglioVCC14}.
%\begin{figure}[!h]
%\begin{lstlisting}[escapeinside={(*}{*)}]
%:graph1 :generatedAt "(*$\tau_1$*)"
%:graph1 { :call1 :caller :Alice .
% :call1 :callee :Bob .}
%:graph2 :generatedAt "(*$\tau_2$*)"
%:graph2 { :call2 :caller :Carl .
% :call2 :callee :Alice .
% :call2 :callee :Bob . }
%\end{lstlisting}
%\vspace*{-4mm}
%\caption{\label{fig:rdfstream}RDF stream snippet example.}
%\end{figure}
In RSP-QL, an \textit{RDF stream} is an unbounded sequence of
time-varying graphs $\langle t,\tau\rangle$, where $t$ is an RDF graph
and $\tau$ is a non-decreasing timestamp. A RSP-QL query is a
continuous query on multiple RDF streams and graphs.
%Figure~\ref{fig:rdfstream} illustrates a RDF
%stream snippet in TriG
%syntax\footnote{\url{https://www.w3.org/TR/trig/}}, with two RDF
%graphs emitted at $\tau_1$ and~$\tau_2$. Each RDF graph represents a
%phone call with a caller and multiple callees. Even though each
%individual call is a tree, merging calls yields a graph: the two calls
%involve the same callee \lstinline{Bob} and the caller of
%\lstinline{call1} is a callee of \lstinline{call2}.
\begin{figure}[!h]
\begin{lstlisting}[language=rsp-ql]
REGISTER STREAM :out
AS CONSTRUCT RSTREAM { ?x a :Hub }
FROM NAMED WINDOW :lwin
ON :in [ RANGE PT120M STEP PT10M]
FROM NAMED WINDOW :swin
ON :in [ RANGE PT10M STEP PT10M]
WHERE {
WINDOW :lwin{
SELECT ?x ( COUNT(*) AS ?totalLong)
WHERE { ?c1 :callee ?x. }
GROUP BY ?x }
WINDOW :swin{
SELECT ?x ( COUNT(*) AS ?totalShort)
WHERE { ?c2 :callee ?x. }
GROUP BY ?x }
GRAPH :bg {?x :hasStandardDeviation ?s }
FILTER ((?totalShort - ?totalLong/12)/?s > 2)
} GROUP BY ?x
\end{lstlisting}
\vspace*{-4mm}
\caption{\label{fig:rspql}RSP-QL example.}
\end{figure}
Figure \ref{fig:rspql} illustrates an RSP-QL query that continuously
identifies \textit{communication hubs}. The idea is to find callees
who appear more frequently than usual. Line~1 registers stream~\lstinline{out} and Line~2 sends
the query result on that stream. \mbox{Lines 3-6} open a short
10-minute tumbling window~\lstinline{swin} and a long \mbox{2-hour}
sliding window~\lstinline{lwin} on the input
stream~\lstinline{in}. \mbox{Lines 8-11} and \mbox{12-15} count the
number of calls per callee in the long and short window,
respectively. \mbox{Lines 16-17} fetch the standard deviation of the
number of calls for each callee from a static graph, join it with the
callees appearing in both windows, and select callees two
standard deviations above average.
\subsection{Stream Reasoning}\label{sec:sr} % Emanuele
%Reasoning automates mathematical logic via systems such as constraint
%solvers, theorem provers, rule engines, and deductive classifiers.
Automated reasoning plays a key role in modern information integration where an ontology offers a conceptual view over
pre-existing autonomous data sources~\cite{DBLP:conf/pods/Lenzerini02}. In this setting, the reasoner
can find answers that are not syntactically present in the data
sources, but are deduced from the data and the ontology. This
query-answering approach is called ontology-based data
access~\cite{DBLP:journals/jods/PoggiLCGLR08}.
\begin{figure}[!t]
\begin{lstlisting}[language=rsp-ql]
SubObjectPropertyOf(
ObjectPropertyChain( :calls :calls ) :gossips
)
TransitiveObjectProperty( :gossips )
REGISTER STREAM GossipMeter AS
SELECT (count(?x) AS ?impact)
FROM NAMED WINDOW :win
ON :in [ RANGE PT60M STEP PT10M]
WHERE { :Alice :gossips ?x }
\end{lstlisting}
\vspace*{-5mm}
\caption{\label{fig:sr}Stream reasoning example with two ontological axioms and a RSP-QL query.}
\vspace*{-4mm}
\end{figure}
As RDF is the dominant data model in reasoning for data integration,
RDF streaming languages (Section~\ref{sec:rdf}) bridge the gap between
stream processing and ontology-based data integration. Della Valle et
al.\ opened up this direction, showing how continuous
reasoning can be reduced to periodic repetition of
reasoning over a windowed ontology
stream~\cite{DBLP:conf/fis/ValleCBBC08}. Figure~\ref{fig:sr} shows an
RSP-QL query that uses reasoning to continuously count how many people
\lstinline{:Alice} gossips with. Consider an RDF stream with the
triples \mbox{$\langle$\lstinline{:Alice :calls :Bob}$,\tau_i\rangle$}
and \mbox{$\langle$\lstinline{:Bob :calls :Carl}$,\tau_{i+1}\rangle$}.
\mbox{Lines 1-4} define \lstinline{:gossips} as the transitive closure
of \lstinline{:calls}. When the window contains these two triples, the
RSP-QL query returns~2, because \lstinline{:Alice :gossips :Bob}
directly calling him, but the system can also deduce that she
\lstinline{:gossips :Carl} indirectly via \lstinline{:Bob}.
While conceptually simple, this kind of reasoning is hard to do
efficiently. Barbieri~et~al.~\cite{DBLP:conf/esws/BarbieriBCVG10} and Komazec~et~al.~\cite{DBLP:conf/debs/KomazecCF12} pioneered it optimizing the DRed algorithm observing that in stream processing
deletion becomes predictable. The current state-of-the-art is the
work of Motik~et~al.~\cite{DBLP:conf/aaai/MotikNPH15a}.
In parallel, Ren and Pan
proposed an alternative approach via truth maintenance
systems~\cite{Ren2011}. Calbimonte~et~al.\ exploited ontology-based data access~\cite{DBLP:conf/esws/CalbimonteMC16}.
Heintz~et~al.\ developed logic-based spatio-temporal stream
reasoning~\cite{DeLengHeintz2016AAAI}. Anicic
et al.\ bridged stream reasoning with complex event processing
grounding both in logic
programming~\cite{DBLP:journals/semweb/AnicicRFS12}. Beck et al.\ used
answer set programming to model expressive stream reasoning
tasks~\cite{DBLP:conf/aaai/BeckDEF15} in
Ticker~\cite{DBLP:journals/tplp/BeckEB17} and
Laser~\cite{DBLP:conf/semweb/BazoobandiBU17}. Inductive stream
reasoning, i.e., applying machine-learning to RDF streams
or to ontology streams, is also an active
field~\cite{DBLP:journals/expert/BarbieriBCVHTRW10,DBLP:conf/ijcai/ChenLPC17,DBLP:conf/ijcai/LecueP13}.