Some writing

isotope-project · Aug 16, 2024 · 84038d0 · 84038d0
1 parent 90917c6
commit 84038d0
Showing 1 changed file with 74 additions and 39 deletions.
diff --git a/paper.tex b/paper.tex
@@ -274,22 +274,42 @@ \section{Static Single Assignment Form}
 
 As a running example, consider the simple imperative program to compute $10!$ given in
 Figure~\ref{fig:ten-fact-program}. In SSA form, it might be written as in
-Figure~\ref{fig:ten-fact-ssa}. Let's break down what changes we've made:
+Figure~\ref{fig:ten-fact-ssa}. In particular,
 \begin{itemize}
   \item We've moved from structured control flow in terms of \ms{while} to
   unstructured jumps between the basic blocks \ms{start} (the entry block),
-  \ms{loop}, and \ms{body}. If this is all we did, we would have converted our
-  program to \textit{3-address code}.
-  \item 
-\end{itemize}
-
-\TODO{Explain informally what is going on in SSA}
-\begin{itemize}
-  \item Introduce $\phi$ to avoid redefinition of $i$, $a$, due to single definition principle
-  \item Otherwise, number definitions
-  \item $\phi$ is a bit confusing to write, but is isomorphic to using basic-blocks with arguments
-  instead:
+  \ms{loop}, and \ms{body}. 
+  \item Replace subexpressions like $i + 1$ in $a * (i + 1)$ with let-bindings,
+  so that every expression in our program is atomic. 
 \end{itemize}
+If this is all we did, we would have converted our program to \textit{3-address
+code}. Unfortunately, many optimizations are quite difficult to express in this
+format, since a variable's value may have been set by many different definitions
+throughout the exectuion of the program. To improve our ability to reason about
+programs, we introduce the \textit{static-single assignment} restriction, which
+says that every variable must be defined at exactly one point in the program. We
+can intuitively represent this as every variable being given by an immutable
+\ms{let}-binding. Unfortunately, it is difficult to express programs with loops
+in this format, since the value of a variable may change on each iteration of the
+loop.
+
+The classical solution to this issue is to introduce \textit{$\phi$-nodes},
+which evaluate to a different value depending on which block we
+\textit{previously} came from. For example, in \ms{loop}, $i_0$ evaluates to 1
+if we came from \ms{start}, and to $i_1$ if we came from \ms{body}. Similarly,
+$a_0$ evaluates to 1 if we came from \ms{start}, and to $a_1$ if we came from
+\ms{body}. This allows us to express the fact that the values of $i_0, a_0$ are
+control-flow dependent while still maintaining the single-definition principle
+(as otherwise, we would need a new definition for $i$ overriding the old one).
+
+Note, however that $i_1, a_0$ are defined \textit{later} in the program than the
+$\phi$-nodes $i_0, a_0$, which would normally be seen as using an undefined
+value. We hence need to use the rather confusing scoping rule that the variables
+in a branch of a $\phi$-node must be defined \textit{when coming from that
+particular block}, even if they are undefined when coming from other blocks.
+This in general makes giving SSA an operational semantics quite confusing, and
+with much time spent in compilers courses trying to build up the requisite
+intuition.
 
 \begin{figure}
   \begin{subfigure}[t]{.5\textwidth}
@@ -330,6 +350,45 @@ \section{Static Single Assignment Form}
   \label{fig:ten-fact-program}
 \end{figure}
 
+One solution to this issue is to transition to an alternative, but isomorphic
+syntax: \textit{basic blocks with arguments}, as in Figure
+\ref{fig:bba-ten-fact}.
+
+\TODO{what is BBA; \cite{appel-ssa} and earlier paper}
+
+\TODO{what is the isomorphism}
+
+\TODO{focus on easier scoping and tail-call semantics}
+
+\TODO{mismatch wrt. scoping between functional programming and \cite{appel-ssa}}
+
+\TODO{what \textit{exactly} is scoping: dominance lore}
+
+\begin{itemize}
+  \item Isomorphism is trivial: we just move the definition from the $\phi$ to the call-site
+  \item This makes semantics of blocks and scoping a lot more clear:
+  \item {\TODO{After introducing BBA, recall Appel's SSA is FP paper \cite{appel-ssa}, and then note
+    that there is a mismatch wrt scoping. Then explain dominance based scoping.}}
+  \item In SSA, to use a variable, it must be defined once we reach the use-point, or else UB
+  \item Variables can only be defined at exactly one point
+  \item Safe approximation: all paths to the use point must cross the definition point at least once
+  \item i.e. use point must \textit{dominate} definition point
+  \item Within a basic block: later variables use earlier ones
+  \item Between basic blocks: all paths through the cfg to the use-block must pass through the def
+        block
+  \item i.e. use-block must \textit{strictly dominate} def-block (all blocks dominate themselves,
+  but see "within BB" rule)
+  \item Examples: \begin{itemize}
+    \item start strictly dominates loop
+    \item loop strictly dominates body
+    \item loop does not strictly dominate loop since loop $=$ loop
+    \item body does not strictly dominate loop since we can get to loop via start, without passing
+    body
+  \end{itemize}
+  \item In general, dominance always forms a tree (explain? cite? \cite{ssa-intro})
+  \item Can topologically sort CFG by dominance tree, and then insert brackets for lexical scoping:
+\end{itemize}
+
 \begin{figure}
   \begin{subfigure}[t]{.5\textwidth}
     \centering
@@ -366,9 +425,12 @@ \section{Static Single Assignment Form}
                                   \\ \\
     \end{align*}
     \caption{Basic-blocks with arguments}
+    \label{fig:bba-ten-fact}
   \end{subfigure}
 
   \TODO{Add arrows with \texttt{tikzmark}?}
+
+  \TODO{cite LLVM, MLIR, Cranelift?}
 
   \caption{ The program in Figure \ref{fig:ten-fact-program} written in standard SSA (using $\phi$
     nodes), like in LLVM, and in basic-blocks with arguments SSA, like in MLIR and Cranelift. The
@@ -378,33 +440,6 @@ \section{Static Single Assignment Form}
   \Description{}
 \end{figure}
 
-\TODO{cite LLVM, MLIR, Cranelift?}
-
-\begin{itemize}
-  \item Isomorphism is trivial: we just move the definition from the $\phi$ to the call-site
-  \item This makes semantics of blocks and scoping a lot more clear:
-  \item {\TODO{After introducing BBA, recall Appel's SSA is FP paper \cite{appel-ssa}, and then note
-    that there is a mismatch wrt scoping. Then explain dominance based scoping.}}
-  \item In SSA, to use a variable, it must be defined once we reach the use-point, or else UB
-  \item Variables can only be defined at exactly one point
-  \item Safe approximation: all paths to the use point must cross the definition point at least once
-  \item i.e. use point must \textit{dominate} definition point
-  \item Within a basic block: later variables use earlier ones
-  \item Between basic blocks: all paths through the cfg to the use-block must pass through the def
-        block
-  \item i.e. use-block must \textit{strictly dominate} def-block (all blocks dominate themselves,
-  but see "within BB" rule)
-  \item Examples: \begin{itemize}
-    \item start strictly dominates loop
-    \item loop strictly dominates body
-    \item loop does not strictly dominate loop since loop $=$ loop
-    \item body does not strictly dominate loop since we can get to loop via start, without passing
-    body
-  \end{itemize}
-  \item In general, dominance always forms a tree (explain? cite? \cite{ssa-intro})
-  \item Can topologically sort CFG by dominance tree, and then insert brackets for lexical scoping:
-\end{itemize}
-
 \begin{figure}
   \centering
   \begin{subfigure}[t]{.5\textwidth}