From c2d40f10e5fc164cf13cdbd0a090e840215b25f7 Mon Sep 17 00:00:00 2001
From: gzagatti <gzagatti@users.noreply.github.com>
Date: Thu, 4 Jan 2024 17:43:30 +0800
Subject: [PATCH] address comments

---
 paper/.ltex/ltex.dictionary.en-US.txt |  2 +
 paper/paper.tex                       | 68 +++++++++++++--------------
 2 files changed, 36 insertions(+), 34 deletions(-)

diff --git a/paper/.ltex/ltex.dictionary.en-US.txt b/paper/.ltex/ltex.dictionary.en-US.txt
index bd1e8529..a8c2d594 100644
--- a/paper/.ltex/ltex.dictionary.en-US.txt
+++ b/paper/.ltex/ltex.dictionary.en-US.txt
@@ -49,3 +49,5 @@ TimeViaThinning
 SSAs
 Doob
 piecewise-constant
+StochasticDiffEq
+MassActionJumps
diff --git a/paper/paper.tex b/paper/paper.tex
index 0b0293f5..f21f0059 100644
--- a/paper/paper.tex
+++ b/paper/paper.tex
@@ -67,17 +67,17 @@
 
 \maketitle
 
-\abstract{Point processes model the occurrence of a countable number of random points over some support. They can model diverse phenomena, such as chemical reactions, stock market transactions and social interactions. We show that \texttt{JumpProcesses.jl} library, which was first developed for simulating jump processes via stochastic simulation algorithms (SSAs) --- including Doob's method, Gillespie's methods, and Kinetic Monte Carlo methods ---, is also fit for point process simulation. Historically, jump processes have been developed in the context of dynamical systems to describe dynamics with discrete jumps. In contrast, the development of point processes has been more focused on describing the occurrence of random events. In this paper, we bridge the gap between the treatment of point and jump process simulation. The algorithms previously included in \texttt{JumpProcesses.jl} can be mapped to three general methods developed in statistics for simulating temporal point processes (TPPs). Our comparative exercise reveals that the library lacked an efficient algorithm for simulating processes with variable intensity rates. We develop a new simulation algorithm \texttt{Coevolve}. This is the first thinning algorithm to step in sync with model time reducing the number of time proposal rejections and allowing for new possibilities such as simulating variable-rate jump process coupled with differential equations via thinning. \texttt{JumpProcesses.jl} can finally simulate any point process on the real line with a non-negative, left-continuous, history-adapted and locally bounded intensity rate efficiently, enabling the library to become one of the few readily available, fast and general-purpose options for simulating TPPs.}
+\abstract{Point processes model the occurrence of a countable number of random points over some support. They can model diverse phenomena, such as chemical reactions, stock market transactions and social interactions. We show that the \texttt{JumpProcesses.jl} library, which was first developed for simulating jump processes via stochastic simulation algorithms (SSAs) --- including Doob's method, Gillespie's methods, and Kinetic Monte Carlo methods --- also provides performant methods or sampling temporal point processes (TPPs). Historically, jump processes have been developed in the context of dynamical systems to describe dynamics with discrete jumps. In contrast, the development of point processes has been more focused on describing the occurrence of random events. In this paper, we bridge the gap between the treatment of point and jump process simulation. The algorithms previously included in \texttt{JumpProcesses.jl} can be mapped to three general methods developed in statistics for simulating TPPs. Our comparative exercise reveals that the library lacked an efficient algorithm for simulating processes with variable intensity rates. We develop a new simulation algorithm \texttt{Coevolve}. This is the first thinning algorithm to step in sync with model time reducing the number of time proposal rejections and allowing for new possibilities such as simulating variable-rate jump processes coupled with differential equations via thinning. \texttt{JumpProcesses.jl} can now simulate any point process on the real line with a non-negative, left-continuous, history-adapted and locally bounded intensity rate efficiently, enabling the library to become one of the few readily available, fast and general-purpose options for simulating TPPs.}
 
 \section{Introduction}
 
-Methods for simulating the trajectory of temporal point processes (TPPs) can be split into exact and inexact methods. Exact methods are exact in the sense that they describe the realization of each point in the process chronologically~\footnote{Some exact methods might not be completely exact since they rely on root finding approximation methods. However, we follow convention and denote all such methods as exact methods.}. This exactness can suffer from reduced performance when simulating systems where numerous events can fire within a short period since every single point needs to be accounted for. Inexact methods trade accuracy for speed by simulating the total number of events in successive intervals. They are popular in biochemical applications, e.g. \( \tau \)-leap methods~\cite{gillespie2001}, which often require the simulation of chemical reactions in systems with large molecular populations.
+Methods for simulating the trajectory of temporal point processes (TPPs) can be split into exact and inexact methods. Exact methods generate statistically exact realizations of each point in the process chronologically~\footnote{Some exact methods might not be completely exact since they rely on root finding approximation methods. However, we follow convention and denote all such methods as exact methods.}. This exactness provides unbiased samples, but can suffer from reduced performance when simulating systems where numerous events can fire within a short period since every single point needs to be accounted for. Inexact methods trade accuracy for speed by simulating the total number of events in successive intervals. They are popular in biochemical applications, \eg \( \tau \)-leap methods~\cite{gillespie2001}, which often require the simulation of chemical reactions in systems with large molecular populations.
 
-Previously, the development of point process simulation libraries focused primarily on univariate processes with exotic intensities, or large systems with conditionally constant intensities, but not on both. As such, there was no widely used general-purpose software for efficiently simulating compound point processes in large systems with time-dependent rates. To enable the efficient simulation of such processes, we contribute a new simulation algorithm together with its implementation as the \texttt{Coevolve} aggregator in \texttt{JumpProcesses.jl}, a core sub-library of the popular \texttt{DifferentialEquations.jl} library~\cite{rackauckas2017}. Our new algorithm is a type of thinning algorithm that thins in sync with model time allowing the coupling of large multivariate TPPs with other algorithms that step chronologically through time such as differential equation solvers. Our new algorithm improves the COEVOLVE algorithm described in~\cite{farajtabar2017} from where the new \texttt{JumpProcesses.jl} aggregator borrows its name. The extension of \texttt{JumpProcesses.jl} dramatically boosts the computational performance of the library in simulating processes with intensities that have an explicit dependence on time and/or other continuous variables,  significantly expanding the type of models that can be efficiently simulated. Widely-used point processes with such intensities include compound inhomogeneous Poisson process, Hawkes process, stress-release process and piecewise deterministic Markov process (PDMP). Since \texttt{JumpProcesses.jl} is a member of Julia's SciML organization, it also becomes easier, and more feasible, to incorporate compound point processes with explicit time-dependent rates into a wide variety of applications and higher-level analyses. With our new additions we bump \texttt{JumpProcesses.jl} to version 9.7\footnote{All examples and benchmarks in this paper use this version of the library}.
+Previously, the development of point process simulation libraries focused primarily on univariate processes with exotic intensities, or large systems with conditionally constant intensities, but not on both. As such, there was no widely used general-purpose software for efficiently simulating compound point processes in large systems with time-dependent rates. To enable the efficient simulation of such processes, we contribute a new simulation algorithm together with its implementation as the \texttt{Coevolve} aggregator in \texttt{JumpProcesses.jl}, a core sub-library of the popular \texttt{DifferentialEquations.jl} library~\cite{rackauckas2017}. Our new method is a type of thinning algorithm that thins in sync with time. This allows the coupling of large multivariate TPPs with other algorithms that step chronologically through time such as differential equation solvers. Our new algorithm improves the COEVOLVE algorithm described in~\cite{farajtabar2017} from where the new \texttt{JumpProcesses.jl} aggregator borrows its name. The addition of Coevolve dramatically boosts the computational performance of the library in simulating processes with intensities that have an explicit dependence on time and/or other continuous variables,  significantly expanding the type of models that can be efficiently simulated. Widely-used point processes with such intensities include compound inhomogeneous Poisson process, Hawkes processes, stress-release processes and piecewise deterministic Markov processes (PDMPs). Since \texttt{JumpProcesses.jl} is a member of Julia's SciML organization, it also becomes easier, and more feasible, to incorporate compound point processes with explicit time-dependent rates into a wide variety of applications and higher-level analyses. Our new additions are available as of \texttt{JumpProcesses.jl} 9.7\footnote{All examples and benchmarks in this paper use version 9.9 of the library}.
 
-This paper starts by bridging the gap between simulation methods developed in statistics and biochemistry, which led us to the development of \texttt{Coevolve}. We briefly introduce TPPs and simulation methods for the Poisson homogeneous process, which serve the basis for all other simulation methods. Then, we identify and discuss three types of exact simulation methods. In the second part of this paper, we describe the algorithms implemented in \texttt{JumpProcesses.jl} and how they relate to the literature. We highlight our contribution \texttt{Coevolve}, investigate the correctness of our implementation and provide performance benchmarks to demonstrate its value. The paper concludes by discussing potential improvements.
+This paper starts by bridging the gap between simulation methods developed in statistics and biochemistry, which led us to the development of \texttt{Coevolve}. We briefly introduce TPPs and simulation methods for the homogeneous Poisson process, which serve as building blocks for all other simulation methods. Then, we identify and discuss three types of exact simulation methods. In the second part of this paper, we describe the algorithms implemented in \texttt{JumpProcesses.jl} and how they relate to the literature. We highlight our contribution \texttt{Coevolve}, investigate the correctness of our implementation and provide performance benchmarks to demonstrate its value. The paper concludes by discussing potential improvements.
 
-\section{The temporal point process} \label{sec:notation}
+\section{The temporal point process}
 
 The TPP is a stochastic collection of marked points over a one-dimensional support. They are exhaustively described in~\cite{daley2003}. The likelihood of any TPP is fully characterized by its conditional intensity,
 \begin{equation}\label{eq:lambda}
@@ -85,7 +85,7 @@ \section{The temporal point process} \label{sec:notation}
 \end{equation}
 and conditional mark distribution, \( f^*(k | t) \) --- see~Chapter 7~\cite{daley2003}. Here \( H_{t^-} = \{ (t_n, k_n) \mid 0 \leq t_n < t \} \) denotes the internal history of the process up to but not including \( t \), the superscript \( \ast \) denotes the conditioning of any function on \( H_{t^-} \), and \( p^\ast(t) \) is the density function corresponding to the probability of an event taking place at time \( t \) given \( H_{t^-} \). We can interpret the conditional intensity as the likelihood of observing a point in the next infinitesimal unit of time, given that no point has occurred since the last observed point in \( H_{t^-} \). Lastly, the mark distribution denotes the density function corresponding to the probability of observing mark \( k \) given the occurrence of an event at time \( t \) and internal history \( H_{t^-} \).
 
-\section{The homogeneous process} \label{sec:method-poisson}
+\section{The homogeneous process}
 
 A homogeneous process can be simulated using properties of the Poisson process, which allow us to describe two equivalent sampling procedures. The first procedure consists of drawing successive inter-arrival times. The distance between any two points in a homogeneous process is distributed according to the exponential distribution --- see Theorem 7.2~\cite{last2017}. Given the homogeneous process with intensity $\lambda$, then the distance \( \Delta t \) between two points is distributed according to $\Delta t \sim \exp(\lambda)$. Draws from the exponential distribution can be performed by drawing from a uniform distribution in the interval $[0, 1]$. If $V \sim U[0, 1]$, then \( T = - \ln(V) / \lambda \sim \exp(1) \). (Note, however, in Julia the optimized Ziggurat-based method used in the \texttt{randexp} stdlib function is generally faster than this \textit{inverse} method for sampling a unit exponential random variable.) When a point process is homogeneous, the \textit{inverse} method of Subsection~\ref{subsec:sim-inverse} reduces to this approach. Thus, we defer the presentation of this Algorithm to the next section.
 
@@ -137,7 +137,7 @@ \subsection{Inverse methods} \label{subsec:sim-inverse}
 
 % see here on computing the inverse of integrals:
 % https://math.stackexchange.com/questions/1467784/inverse-of-a-functions-integral
-The main drawback of the \textit{inverse} method is that the root finding problem defined in Equation~\ref{eqn:inverse} often requires a numerical solution. To get around a similar obstacle in the context of the PDMP, Veltz~\cite{veltz2015} proposes a change of variables in time that recasts the root finding problem into an initial value problem. He denotes his method \textit{CHV}.
+The main drawback of the \textit{inverse} method is that the root finding problem defined in Equation~\ref{eqn:inverse} often requires a numerical solution. To get around a similar obstacle in the context of PDMPs, Veltz~\cite{veltz2015} proposes a change of variables in time that recasts the root finding problem into an initial value problem. He denotes his method \textit{CHV}.
 
 PDMPs are composed of two parts: the jump process and the piecewise ODE that changes stochastically at jump times --- see Lemaire~\etal~\cite{lemaire2018} for a formal definition. Therefore, it is easy to employ \textit{CHV} in our case by setting the ODE part to zero throughout time. Adapting from Veltz~\cite{veltz2015}, we can determine the model jump time \( t_n \) after sampling \( \Delta \tilde{t}_n \sim \exp(1) \) by solving the following initial value problem until \( \Delta \tilde{t}_n \).
 \begin{equation} \label{eqn:chv-simple}
@@ -203,7 +203,7 @@ \subsection{Inverse methods} \label{subsec:sim-inverse}
 
 \subsection{Thinning methods} \label{subsec:sim-thinning}
 
-\textit{Thinning} methods are one of the most popular for simulating point processes. The main idea is to successively sample a homogeneous process, then thin the obtained points with the conditional intensity of the original process. As stated in Proposition 7.5.I~\cite{daley2003}, this procedure simulates the target process by construction. The advantage of \textit{thinning} over \textit{inverse} methods is that the former only requires the evaluation of the conditional intensity function while the latter requires computing the inverse of its integrated form~\cite{daley2003}.
+\textit{Thinning} methods are popular approaches for simulating point processes. The main idea is to successively sample a homogeneous process, then thin the obtained points with the conditional intensity of the original process. As stated in Proposition 7.5.I~\cite{daley2003}, this procedure simulates the target process by construction. The advantage of \textit{thinning} over \textit{inverse} methods is that the former only requires the evaluation of the conditional intensity function while the latter requires computing the inverse of its integrated form~\cite{daley2003}.
 
 \textit{Thinning} algorithms have been proposed in different forms~\cite{daley2003}. Shedler-Lewis~\cite{lewis1979} first suggested a thinning routine that simulated processes with bounded intensity over a fixed interval. Ogata's refinement~\cite{ogata1981} suggests a procedure for evolving the simulation via local boundary conditions and fixed partitions of the real line. As long as the intensity conditioned on the simulated history remains locally bounded, it is possible to simulate subsequent points indefinitely.
 
@@ -219,9 +219,9 @@ \subsection{Thinning methods} \label{subsec:sim-thinning}
 \end{equation}
 The tighter the bound \( \bar{B}^\ast (\cdot) \) on \( \lambda^\ast (\cdot) \), the lower the number of discarded samples. Since looser bounds lead to less efficient algorithms, the art, when simulating via \textit{thinning}, is to find the optimal balance between the local supremum of the conditional intensity \( \bar{B}^\ast(\cdot) \) and the duration of the local interval \( L^\ast(t) \). On the other hand, the infimum \( \ubar{B}^\ast(\cdot) \) can be used to avoid the evaluation of \( \lambda^\ast \, (\cdot) \) in Line~\ref{line:short-circuit} of Algorithm~\ref{algo:next-time-thinning} which often can be expensive.
 
-In Line~\ref{line:u-sample} of Algorithm~\ref{algo:sim-thinning}, since the candidate interval \( u \) is itself the random inter-event interval from a TPP with conditional intensity \( \bar{B}^\ast(\cdot) \), we are back to simulating a TPP via the inverse method. Therefore, the wrong choice of \( \bar{B}^\ast(\cdot) \) could in fact deteriorate the performance of the simulation. In many applications, the bound \( \bar{B}^\ast(\cdot) \) is fixed over \( L^\ast(t) \) which simplifies the simulation since then \( u \sim \exp(\bar{B}^\ast (t)) \).  Alternatively, Bierkens~\etal~\cite{bierkens2019} uses a Taylor approximation of \( \lambda^\ast(t) \) to obtain an upper-bound which is a linear function of \( t \)~\footnote{Their implementation of the Zig-Zag process, a type of PMDP for Markov Chain Monte Carlo, is available as a Julia package at \url{https://github.com/mschauer/ZigZagBoomerang.jl}.}.
+In Line~\ref{line:u-sample} of Algorithm~\ref{algo:sim-thinning}, since the candidate interval \( u \) is itself the random inter-event interval from a TPP with conditional intensity \( \bar{B}^\ast(\cdot) \), we are back to simulating a TPP via the inverse method. Therefore, the wrong choice of \( \bar{B}^\ast(\cdot) \) could in fact deteriorate the performance of the simulation. In many applications, the bound \( \bar{B}^\ast(\cdot) \) is constant over \( [0, L^\ast(t)] \) which simplifies the simulation since then \( u \sim \exp(\bar{B}^\ast (t)) \).  Alternatively, Bierkens~\etal~\cite{bierkens2019} uses a Taylor approximation of \( \lambda^\ast(t) \) to obtain an upper-bound which is a linear function of \( t \)~\footnote{Their implementation of the Zig-Zag process, a type of PMDP for Markov Chain Monte Carlo, is available as a Julia package at \url{https://github.com/mschauer/ZigZagBoomerang.jl}.}.
 
-When the conditional intensity is constant between jumps such that \( \lambda^\ast \, (t) = \lambda_{n-1} , \forall t_{n-1} \leq t < t_n \), let \( \bar{B}^\ast(t) = \ubar{B}^\ast(t) = \lambda_{n-1} \) and \( L^\ast(t) = \infty \). We have that for any \( u \sim \exp(1 \; / \; \bar{B}^\ast(t)) =  \exp(\lambda_{n-1})\) and \( v \sim U[0, 1] \), \( u < L^\ast(t) = \infty \) and \( v < \lambda^\ast \, (t + u) \; / \; \bar{B}^\ast(t) = 1 \). Therefore, we advance the internal history for every iteration of Algorithm~\ref{algo:sim-thinning}. In this case, the bound \( \bar{B}^\ast(t) \) is as tight as possible, and this method becomes the same as the \textit{inverse} method of Subsection~\ref{subsec:sim-inverse}.
+When the conditional intensity is constant between jumps such that \( \lambda^\ast \, (t) = \lambda_{n-1} , \forall t_{n-1} \leq t < t_n \), let \( \bar{B}^\ast(t) = \ubar{B}^\ast(t) = \lambda_{n-1} \) and \( L^\ast(t) = \infty \). We have that for any \( u \sim \exp(1 \; / \; \bar{B}^\ast(t)) =  \exp(\lambda_{n-1})\) and \( v \sim U[0, 1] \), \( u < L^\ast(t) = \infty \) and \( v < \lambda^\ast \, (t + u) \; / \; \bar{B}^\ast(t) = 1 \). Therefore, we advance the internal history for every iteration of Algorithm~\ref{algo:sim-thinning}. In this case, the bound \( \bar{B}^\ast(t) \) is as tight as possible, and this method becomes equivalent to the \textit{inverse} method of Subsection~\ref{subsec:sim-inverse}.
 
 We can draw more connections between \textit{thinning} and \textit{inversion}. Lemaire~\etal~\cite{lemaire2018} propose a version of the \textit{thinning} algorithm for PDMPs which does not use a local interval for rejection --- equivalent to \( L^\ast(t) = \infty \). They propose an optimal upper-bound \( \bar{B}^\ast(t) \) as a piecewise constant function partitioned in such a way that it envelopes the intensity function as strictly as possible. The efficiency of their algorithm depends on the assumption that the stochastic process determined by \( \bar{B}^\ast(t) \) can be efficiently inverted. They show that under certain conditions the stochastic process determined by \( \bar{B}^\ast(t) \) converges in distribution to the target conditional intensity as the partitions of the optimal boundary converge to zero. These results suggest that the efficiency of \textit{thinning} compared to \textit{inversion} most likely depends on the rejection rate obtained by the former and the number of steps required by the ODE solver for the latter.
 
@@ -255,32 +255,32 @@ \subsection{Thinning methods} \label{subsec:sim-thinning}
       \While{\( t < T \)}
         \State update \( \lambda^\ast \) \label{line:lambda-update}
         \State find \( \bar{B}^\ast (t) \), \( \ubar{B}^\ast (t) \) and \( L^\ast(t) \) which satisfy Eq.~\ref{eq:thinning-condition}
-        \State draw candidate interval \( u \) such that \( P(u > s) = \exp( - \int_0^s \bar{B}^\ast (t + s) ds ) \) \label{line:u-sample}
+        \State draw candidate interval \( u \) such that \\ \hskip2.5em \( P(u > s) = \exp( - \int_0^s \bar{B}^\ast (t + s) ds ) \) \label{line:u-sample}
         \State draw acceptance threshold \( v \sim U[0, 1] \)
         \If{\( u > L^\ast(t) \)}
           \State \( t \leftarrow t + L^\ast(t) \)
           \State \textbf{next}
         \EndIf
-        \If{\( ( v \leq \ubar{B}^\ast(t + u) ) \lor ( v \leq \lambda^\ast \, (t + u) / \bar{B}^\ast(t + u) ) \)} \label{line:short-circuit}
+        \If{\( ( v \leq \ubar{B}^\ast(t + u) ) \) or \( ( v \leq \lambda^\ast \, (t + u) / \bar{B}^\ast(t + u) ) \)} \label{line:short-circuit}
           \State \( t \leftarrow t + u \)
-          \State \Return t
+          \State \Return \( t \)
         \EndIf
         \State \( t \leftarrow t + u \)
       \EndWhile
-      \State \Return t
+      \State \Return \( t \)
   \EndProcedure
 \end{algorithmic}
 \caption{Generates the next event time via \textit{thinning}.}
 \label{algo:next-time-thinning}
 \end{algorithm}
 
-\subsection{Queuing methods for multivariate processes} \label{subsec:sim-queuing}
+\subsection{Queuing methods for multivariate processes}
 
 As an alternative to his \textit{direct} method --- in this text referred as the constant rate \textit{thinning} method ---, Gillespie introduced the \textit{first reaction} method in his seminal work on simulation algorithms~\cite{gillespie1976}. The \textit{first reaction} method separately simulates the next reaction time for each reaction channel --- \ie~for each mark. It then selects the smallest time as the time of the next event, followed by updating the conditional intensity of all channels accordingly. This is a variation of the constant rate \textit{thinning} method to simulate a set of inter-dependent point processes, making use of the \textit{superposition theorem} --- Theorem 3.3~\cite{last2017} --- in the inverse direction.
 
 Gibson and Bruck~\cite{gibson2000} improved the \textit{first reaction} method with the \textit{next reaction} method. They innovate on three fronts. First, they keep a priority queue to quickly retrieve the next event. Second, they keep a dependency graph to quickly locate all conditional intensity rates that need to be updated after an event is fired. Third, they re-use previously sampled reaction times to update unused reaction times. This minimizes random number generation, which can be costly. Priority queues and dependency graphs have also been used in the context of social media~\cite{farajtabar2017} and epidemics~\cite{holme2021} simulation. In both cases, the phenomena are modelled as point processes.
 
-We prefer to call this class of methods \textit{queued thinning} methods since most efficiency gains come from maintaining a priority queue of the next event times. Up to this point we assumed that we were sampling from a global process with a mark distribution that could generate any mark \( k \) given an event at time \( t \). With queuing, it is possible to simulate point processes with a finite space of marks as \( M \) interdependent point processes --- see Definition 6.4.1~\cite{daley2003} of multivariate point processes --- doing away with the need to draw from the mark distribution at every event occurrence. Alternatively, it is possible to split the global process into \( M \) interdependent processes each one of which with its own mark distribution.
+We prefer to call this class of methods \textit{queued thinning} methods since most efficiency gains come from maintaining a priority queue of the next event times. Up to this point we assumed that we were sampling from a global process with a mark distribution that could generate any mark \( k \) given an event at time \( t \). With queuing, it is possible to simulate point processes with a finite space of marks as \( M \) interdependent point processes --- see Definition 6.4.1~\cite{daley2003} of multivariate point processes --- doing away with the need to draw from the mark distribution at every event occurrence. Alternatively, it is possible to split the global process into \( M \) interdependent processes each of which has its own mark distribution.
 
 Algorithm~\ref{algo:sim-queuing}, presents a method for sampling a superposed point process consisting of \( M \) processes by keeping the strike time of each process in a priority queue \( Q \). The priority queue is initially constructed in \( O(M) \) steps in Lines~\ref{line:queuing-init-begin} to~\ref{line:queuing-init-end} of Algorithm~\ref{algo:sim-queuing}. In contrast to \textit{thinning} methods, updates to the conditional intensity depend only on the size of the neighborhood of \( i \). That is, processes \( j \) whose conditional intensity depends on the history of \( i \). If the graph is sparse, then updates will be faster than with \textit{thinning}.
 
@@ -326,7 +326,7 @@ \subsection{Queuing methods for multivariate processes} \label{subsec:sim-queuin
         \State \textbf{break}
       \EndIf
       \State draw \( v \sim U[0, \bar{B}_i^\ast] \)
-      \If{\( a_i \land ( v > \ubar{B}_i^\ast ) \land ( v > \lambda^\ast \, (t) ) \)}
+      \If{\( a_i \) and \( ( v > \ubar{B}_i^\ast ) \) and \( ( v > \lambda^\ast \, (t) ) \)}
         \State \( a_i \leftarrow \operatorname{false} \)
       \EndIf
       \If{ \( a_i \)}
@@ -335,11 +335,11 @@ \subsection{Queuing methods for multivariate processes} \label{subsec:sim-queuin
         \State update  \( f^\ast \) and draw the mark \( k_n \sim f_i^\ast \, (k \mid t_n) \)
         \State update the history \( H_{T^-} \leftarrow H_{T^-} \cup (t_n, k_n) \)
         \For{\( j \in \{ i \} \cup \operatorname{Neighborhood}(i) \)}
-          \State \( (t_j, \bar{B}_j^\ast, \ubar{B}_j^\ast, a_j) \leftarrow \operatorname{QueueTime}(t, H_{T^-}, \lambda_{j}^\ast(\cdot)) \)
+          \State \( (t_j, \bar{B}_j^\ast, \ubar{B}_j^\ast, a_j) \leftarrow \operatorname{QueueTime}(t, \lambda_{j}^\ast, H_{T^-}) \)
           \State update \( (j, t_j, \bar{B}_j^\ast, \ubar{B}_j^\ast, a_j) \) in \( Q \)
         \EndFor
       \Else
-        \State \( (t_i, \bar{B}_i^\ast, \ubar{B}_i^\ast, a_i) \leftarrow \operatorname{QueueTime}(t, H_{T^-}, \lambda_{i}^\ast(\cdot)) \) \label{line:candidate-two}
+        \State \( (t_i, \bar{B}_i^\ast, \ubar{B}_i^\ast, a_i) \leftarrow \operatorname{QueueTime}(t, \lambda_{i}^\ast, H_{T^-}) \) \label{line:candidate-two}
         \State update \( (i, t_i, \bar{B}_i^\ast, \ubar{B}_i^\ast, a_i) \) in \( Q \)
       \EndIf
     \EndWhile
@@ -360,31 +360,31 @@ \subsection{Queuing methods for multivariate processes} \label{subsec:sim-queuin
 
 \section{Implementation} \label{sec:implementation}
 
-\texttt{JumpProcesses.jl} is a Julia library for simulating jump --- or point --- processes which is part of Julia's SciML organization. Jumps are implemented as callbacks of a \texttt{OrdinaryDiffEq.jl} numerical solver. In simple terms, callbacks are functions that can be arbitrarily called at each step of the main loop of the solver.
+\texttt{JumpProcesses.jl} is a Julia library for simulating jump --- or point --- processes which is part of Julia's SciML organization. Jumps are handled via callbacks that are checked at the end of each time-step of some time evolution algorithm, \eg an ODE solver from \texttt{OrdinaryDiffEq.jl}, a stochastic differential equation solver from \texttt{StochasticDiffEq.jl}, or the pure-jump process \texttt{SSAStepper} provided by \texttt{JumpProcesses.jl}. In simple terms, callbacks are functions that can be arbitrarily called at each step of the main loop of a time-stepping method.
 
-Our discussion in Section~\ref{sec:act} identified three exact methods for simulating point processes. In all the cases, we identified two mathematical constructs required for simulation: the intensity rate and the mark distribution. In \texttt{JumpProcesses.jl}, these can be mapped to user defined functions \texttt{rate(u, p, t)} and \texttt{affect!(integrator)}. The library provides APIs for defining processes based on the nature of the intensity rate and the intended simulation algorithm. Processes intended for exact methods can choose between \texttt{ConstantRateJump} and \texttt{VariableRateJump}. While the former expects the rate between jumps to be constant, the latter allows for time-dependent rates. The library also provides the \texttt{MassActionJump} API to define large systems of point processes that can be expressed as reaction equations. Finally, \texttt{RegularJump} are intended for inexact methods.
+Our discussion in Section~\ref{sec:act} identified three exact methods for simulating point processes. In all the cases, we identified two mathematical constructs required for simulation: the intensity rate and the mark distribution. In \texttt{JumpProcesses.jl}, these can be mapped to user defined functions \texttt{rate(u, p, t)} and \texttt{affect!(integrator)}. The former takes the current state of the system, \texttt{u}, user provided parameters, \texttt{p}, and the current time, \texttt{t}, and returns the value of the intensity function at this time. The latter takes the solver \texttt{integrator} object, which stores all solution information, and updates it, including the state \texttt{integrator.u}, for whatever changes should occur when the jump it encodes fires at the time \texttt{integrator.t}. The library provides APIs for defining processes based on the nature of the intensity rate and the intended simulation algorithm. Processes simulated using exact sampling methods can choose between \texttt{ConstantRateJump} and \texttt{VariableRateJump}. While the former expects the rate between jumps to be constant, the latter allows for time-dependent rates. The library also provides the \texttt{MassActionJump} API to define large systems of point processes that can be expressed as mass action type reaction equations. Finally, \texttt{RegularJump} is intended for tau-leaping methods.
 
-The \textit{inverse} method as described around Equation~\ref{eqn:inverse} uses root find to find the next jump time. Jumps to be simulated via the \textit{inverse} method must be initialized as a \texttt{VariableRateJump}. \texttt{JumpProcesses.jl} builds a continuous callback following the algorithm in~\cite{salis2005} and passes the problem to an \texttt{OrdinaryDiffEq.jl} integrator, which easily interoperates with \texttt{JumpProcesses.jl} (both libraries are part of the \textit{SciML} organization, and by design built to easily compose). \texttt{JumpProcesses.jl} does not yet support the CHV ODE based approach.
+The \textit{inverse} method as described around Equation~\ref{eqn:inverse} uses root finding to calculate the next jump time. Jumps to be simulated via the \textit{inverse} method must be initialized as a \texttt{VariableRateJump}. \texttt{JumpProcesses.jl} builds a continuous callback following the algorithm in~\cite{salis2005} and passes the problem to an \texttt{OrdinaryDiffEq.jl} integrator, which easily interoperates with \texttt{JumpProcesses.jl} (both libraries are part of the \textit{SciML} organization, and by design built to easily compose). \texttt{JumpProcesses.jl} does not yet support the CHV ODE based approach.
 
-Alternatively, \textit{thinning} methods can be simulated via discrete steps. In the context of the library, any method that uses a discrete callback is called an \textit{aggregator}. There are twelve different aggregators which we discuss below and are summarized in Table~\ref{tab:aggregators} in the \hyperref[sec:annex]{Annex}.
+Alternatively, \textit{thinning} methods can be simulated via discrete steps. In the context of the library, any method that uses thinning via a discrete callback is called an \textit{aggregator}. There are twelve different aggregators which we discuss below and are summarized in Table~\ref{tab:aggregators} in the \hyperref[sec:annex]{Annex}. Aggregator's handle sampling the next jump time and type, which is then read via the callback by the user-selected time-stepper.
 
 We start with constant rate \textit{thinning} aggregators for marked TPPs. Algorithm~\ref{algo:sim-thinning} assumes that there is a single process. In reality, all the implementations first assume a finite multivariate point process with \( M \) interdependent sub-processes. This can be easily conciliated, as we do now, using Definition 6.4.1~\cite{daley2003} which states the equivalence of such process with a point process with a finite space of marks. 
 
-As all the constant rate \textit{thinning} aggregators only deal with \texttt{ConstantRateJump}, the intensity between jumps is constant, Algorithm~\ref{algo:next-time-thinning} short-circuits to quickly return \( t \sim \exp(\bar{B}) = \exp(\lambda_n) \) as discussed in Subsection~\ref{subsec:sim-thinning}. Next, the mark distribution becomes the categorical distribution weighted by the intensity of each process. That is, given an event at time \( t_n \), we have that the probability of drawing process \( i \) out of \( M \) sub-processes is \( \lambda_i^\ast (t_n)  / \lambda^\ast (t_n) \). Conditional on sub-process \( i \), the corresponding \texttt{affect!(integrator)} is invoked, that is, \( k_n \sim f_i^\ast (k \mid t_n) \). So all sub-process could potentially be marked. Where most implementations differ is on updating the mark distribution in Line~\ref{line:thinning-mark-sample} of Algorithm~\ref{algo:sim-thinning} and the conditional intensity rate in Line~\ref{line:lambda-update} of Algorithm~\ref{algo:next-time-thinning}. 
+As all the constant rate \textit{thinning} aggregators only support \texttt{ConstantRateJump}s and \texttt{MassActionJump}s, \ie the intensity between jumps is constant, Algorithm~\ref{algo:next-time-thinning} short-circuits to quickly return \( t \sim \exp(\bar{B}) = \exp(\lambda_n) \) as discussed in Subsection~\ref{subsec:sim-thinning}. Next, the mark distribution becomes the categorical distribution weighted by the intensity of each process. That is, given an event at time \( t_n \), we have that the probability of drawing process \( i \) out of \( M \) sub-processes is \( \lambda_i^\ast (t_n)  / \lambda^\ast (t_n) \). Conditional on sub-process \( i \), the corresponding \texttt{affect!(integrator)} is invoked, that is, \( k_n \sim f_i^\ast (k \mid t_n) \). So all sub-processes could potentially be marked, but note users need to handle any additional sampling related to such marks within their provided \texttt{affect!} function. Where most implementations differ is on updating the mark distribution in Line~\ref{line:thinning-mark-sample} of Algorithm~\ref{algo:sim-thinning} and the conditional intensity rate in Line~\ref{line:lambda-update} of Algorithm~\ref{algo:next-time-thinning}. 
 
-\texttt{Direct} and \texttt{DirectFW} follows the \textit{direct} method in~\cite{gillespie1976} which re-evaluates all intensities after every iteration scaling at \( O(K) \). It draws the next-time from the ground process whose rate is the sum of all sub-processes' rates. It selects the mark by executing a search in an array that stores the cumulative sum of rates. 
+\texttt{Direct} and \texttt{DirectFW} follow the \textit{direct} method in~\cite{gillespie1976} which re-evaluates all intensities after every iteration scaling at \( O(K) \). It draws the next-time from the ground process whose rate is the sum of all sub-processes' rates. It selects the mark by executing a search in an array that stores the cumulative sum of rates. 
 
 \texttt{SortingDirect}, \texttt{RDirect}, \texttt{DirectCR} are improvements over the \texttt{Direct} method. They only re-evaluate the intensities of the processes that are affected by the realized process based on a dependency graph. \texttt{SortingDirect} draws from the ground process, but keeps the intensity rate in a loosely sorted array following~\cite{mccollum2006}. \texttt{RDirect} is a rejection-based direct method which assigns the maximum rate of the system as the bound to all processes. The implementation slight differs from Algorithm~\ref{algo:sim-thinning}. Since all sub-process have the same rate it draws the next time from a homogeneous Poisson process with the maximum rate, then randomly selects a candidate process and confirms the candidate only if its rate is above a random proportion of the maximum rate. \texttt{DirectCR} --- from~\cite{slepoy2008} --- is a composition-rejection method that groups sub-processes with similar rates using a priority table. Each group is assigned the sum of all the rates within it. We apply a routine equivalent to \texttt{Direct} to select the time in which the next group fires. Given a group, we then select which process fires.
 
-\texttt{RSSA} and \texttt{RSSACR} places processes in bounded brackets. \texttt{RSSA} --- from~\cite{thanh2014} --- follows Algorithm~\ref{algo:sim-thinning} very closely in the case where the bounds are constant between jumps. \texttt{RSSACR} --- from ~\cite{thanh2017} --- groups sub-processes with similar rates like \texttt{DirectCR}, but then places each group within a bounded bracket. It then samples the next group to fire similar to \texttt{RSSA}. Given the group, it selects a candidate to fire and performs a thinning routine to accept or reject.
+\texttt{RSSA} and \texttt{RSSACR} place processes in bounded brackets. \texttt{RSSA} --- from~\cite{thanh2014} --- follows Algorithm~\ref{algo:sim-thinning} very closely in the case where the bounds are constant between jumps. \texttt{RSSACR} --- from ~\cite{thanh2017} --- groups sub-processes with similar rates like \texttt{DirectCR}, but then places each group within a bounded bracket. It then samples the next group to fire similar to \texttt{RSSA}. Given the group, it selects a candidate to fire and performs a thinning routine to accept or reject.
 
-Next, we consider the \textit{queued thinning} aggregators. Starting with aggregators that only support \texttt{ConstantRateJump}s we have, \texttt{FRM}, \texttt{FRMFW} and \texttt{NRM}. \texttt{FRM} and \texttt{FRMFW} follow the \textit{first reaction} method in~\cite{gillespie1976}. To compute the next jump, both algorithms compute the time to the next event for each process and select the process with minimum time. This is equivalent to assuming a complete dependency graph in Algorithm~\ref{algo:sim-queuing}. For large systems, these methods are inefficient compared to \texttt{NRM} which is a \texttt{queued thinning} method sourced from~\cite{gibson2000}.
+Finally, we have what we call the \textit{queued thinning} aggregators. Starting with aggregators that only support \texttt{ConstantRateJump}s we have, \texttt{FRM}, \texttt{FRMFW} and \texttt{NRM}. \texttt{FRM} and \texttt{FRMFW} follow the \textit{first reaction} method in~\cite{gillespie1976}. To compute the next jump, both algorithms compute the time to the next event for each process and select the process with minimum time. This is equivalent to assuming a complete dependency graph in Algorithm~\ref{algo:sim-queuing}. For large systems, these methods are inefficient compared to \texttt{NRM} which is a \texttt{queued thinning} method sourced from~\cite{gibson2000}. \texttt{NRM} gains efficiency by using an indexed priority queue to store and determine next event times, and by using dependency graphs to only update intensities that would need to be recalculated after a given event.
 
 Most of the algorithms implemented in \texttt{JumpProcesses.jl} come from the biochemistry literature. There has been less emphasis on implementing processes commonly studied in statistics such as self-exciting point processes characterized by time-varying and history-dependent intensity rates. Our latest aggregator, \texttt{Coevolve}, which is an implementation of Algorithm~\ref{algo:sim-queuing}, addresses this gap. This is the first aggregator that supports \texttt{VariableRateJump}s. Compared with the current \textit{inverse} method-based approach that relies on ODE integration, the new aggregator substantially improves the performance of simulations with time-dependent intensity rates and/or coupled with differential equations from \texttt{DifferentialEquations.jl}.
 
-\texttt{Coevolve} also employs a few enhancements compared to Algorithm~\ref{algo:sim-queuing}. First, we avoid the re-computation of unused random numbers. When updating processes that have not yet fired, we can transform the unused time of constant rate processes to obtain the next candidate time for the first round of iteration of the \textit{thinning} procedure in Algorithm~\ref{algo:next-time-thinning}. This saves one round of sampling from the exponential distribution, which translates into a faster algorithm. Second, it adapts to processes with constant intensity between jumps which reduces the loop in Algorithm~\ref{algo:next-time-thinning} to the equivalent implemented in \texttt{NRM}.
+\texttt{Coevolve} also employs several enhancements compared to Algorithm~\ref{algo:sim-queuing}. First, we avoid the re-computation of unused random numbers. When updating processes that have not yet fired, we can transform the unused time of constant rate processes to obtain the next candidate time for the first round of iteration of the \textit{thinning} procedure in Algorithm~\ref{algo:next-time-thinning}. This saves one round of sampling from the exponential distribution, which translates into a faster algorithm. Second, it adapts to processes with constant intensity between jumps which reduces the loop in Algorithm~\ref{algo:next-time-thinning} to the equivalent implemented in \texttt{NRM} for \texttt{ConstantRateJump}s and \texttt{MassActionJump}s.
 
-\section{Empirical evaluation} \label{sec:evaluation}
+\section{Empirical evaluation}
 
 This section conducts some empirical evaluation of the \texttt{JumpProcesses.jl} aggregators described in Section~\ref{sec:implementation}. First, since \texttt{Coevolve} is a new aggregator, we test its correctness by conducting statistical analysis. Second, we conduct the jump benchmarks available in  \texttt{SciMLBenchmarks.jl}. We have added new benchmarks that assess the performance of the new aggregators under settings that could not be simulated with previous aggregators.
 
@@ -436,7 +436,7 @@ \subsection{Statistical analysis of \texttt{Coevolve}}
   sol = solve(jprob, SSAStepper())
 \end{lstlisting}
 
-To assess the correctness of \texttt{Coevolve}, we add it to the \texttt{JumpProcesses.jl} test suite. Some tests check whether the aggregators are able to obtain empirical statistics close to the expected in a number of simple biochemistry models such as linear reactions, DNA repression, reversible binding and extinction. The test suite was missing a unit test for self-exciting process. Thus, we have added a test for the univariate Hawkes model that checks whether algorithms that accept \texttt{VariableRateJump} are able to produce an empirical distribution of trajectories whose first two moments of the observed rate are close to the expected ones.
+To assess the correctness of \texttt{Coevolve}, we add it to the \texttt{JumpProcesses.jl} test suite. Some tests check whether the aggregators are able to obtain empirical statistics close to the expected in a number of simple biochemistry models such as linear reactions, DNA repression, reversible binding and extinction. The test suite was missing a unit test for a self-exciting process. Thus, we have added a test for the univariate Hawkes model that checks whether algorithms that accept \texttt{VariableRateJump} are able to produce an empirical distribution of trajectories whose first two moments of the observed rate are close to the expected ones.
 
 In addition to that, the correctness of the implemented algorithm can be visually assessed using a Q-Q plot. As discussed in Subsection~\ref{subsec:sim-inverse}, every simple point process can be transformed to a Poisson process with unit rate. This implies that the interval between points for any such transformed process should match the exponential distribution. Therefore, the correctness of any aggregator can be assessed as following. First, transform the simulated intervals with the appropriate compensator. Let \( t_{n_i} \) be the time in which the \( n \)-th event of sub-process \( i \) took place and \( t_{0_i} \equiv 0 \), the compensator for sub-process \( i \) is given by the following:
 \begin{equation}
@@ -603,12 +603,12 @@ \section{Conclusion}
 Finally, \texttt{JumpProcesses.jl} also includes algorithms for jumps over two-dimensional spaces. It might be worth conducting a similar comparative exercise to identify algorithms in statistics for \(2 \)- and \( N \)-dimensional processes that could also be added to \texttt{JumpProcess.jl} as it has the potential to become the go-to library for general point process simulation.
 
 \section{Acknowledgements}
-This project has been made possible in part by grant number 2021-237457 from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation. SAI was also partially supported by NSF-DMS 1902854.
+This project has been made possible in part by grant number 2021-237457 from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation. SAI was also partially supported by NSF-DMS 1902854 and 2325185.
 
 \bibliographystyle{juliacon}
 \bibliography{references}
 
-\section*{Annex} \label{sec:annex}
+\section*{Annex}
 
 \begin{table*}
 \centering
@@ -635,7 +635,7 @@ \section*{Annex} \label{sec:annex}
 
 \texttt{DiretFW}
   &  Direct with \texttt{FunctionWrapper}
-  &  Same as \texttt{Direct}, but wraps rate functions with \texttt{FunctionWrapper} for type stability and efficiency.
+  &  Same as \texttt{Direct}, but wraps rate functions with \texttt{FunctionWrapper} for type stability and better performance in system with \textit{many} jumps.
   &  ground
   &  all
   &  x
@@ -727,7 +727,7 @@ \section*{Annex} \label{sec:annex}
 
 \texttt{FRMFW}
   & First reaction method with \texttt{FunctionWrapper}
-  & Same as \texttt{FRM}, but wraps rate functions with \texttt{FunctionWrapper} for type stability and efficiency.
+  & Same as \texttt{FRM}, but wraps rate functions with \texttt{FunctionWrapper} for type stability and better performance in systems with \textit{many} jumps.
   & sub
   & all
   & x