Skip to content

Commit

Permalink
Some writing
Browse files Browse the repository at this point in the history
  • Loading branch information
imbrem committed Aug 16, 2024
1 parent 90917c6 commit 84038d0
Showing 1 changed file with 74 additions and 39 deletions.
113 changes: 74 additions & 39 deletions paper.tex
Original file line number Diff line number Diff line change
Expand Up @@ -274,22 +274,42 @@ \section{Static Single Assignment Form}

As a running example, consider the simple imperative program to compute $10!$ given in
Figure~\ref{fig:ten-fact-program}. In SSA form, it might be written as in
Figure~\ref{fig:ten-fact-ssa}. Let's break down what changes we've made:
Figure~\ref{fig:ten-fact-ssa}. In particular,
\begin{itemize}
\item We've moved from structured control flow in terms of \ms{while} to
unstructured jumps between the basic blocks \ms{start} (the entry block),
\ms{loop}, and \ms{body}. If this is all we did, we would have converted our
program to \textit{3-address code}.
\item
\end{itemize}

\TODO{Explain informally what is going on in SSA}
\begin{itemize}
\item Introduce $\phi$ to avoid redefinition of $i$, $a$, due to single definition principle
\item Otherwise, number definitions
\item $\phi$ is a bit confusing to write, but is isomorphic to using basic-blocks with arguments
instead:
\ms{loop}, and \ms{body}.
\item Replace subexpressions like $i + 1$ in $a * (i + 1)$ with let-bindings,
so that every expression in our program is atomic.
\end{itemize}
If this is all we did, we would have converted our program to \textit{3-address
code}. Unfortunately, many optimizations are quite difficult to express in this
format, since a variable's value may have been set by many different definitions
throughout the exectuion of the program. To improve our ability to reason about
programs, we introduce the \textit{static-single assignment} restriction, which
says that every variable must be defined at exactly one point in the program. We
can intuitively represent this as every variable being given by an immutable
\ms{let}-binding. Unfortunately, it is difficult to express programs with loops
in this format, since the value of a variable may change on each iteration of the
loop.

The classical solution to this issue is to introduce \textit{$\phi$-nodes},
which evaluate to a different value depending on which block we
\textit{previously} came from. For example, in \ms{loop}, $i_0$ evaluates to 1
if we came from \ms{start}, and to $i_1$ if we came from \ms{body}. Similarly,
$a_0$ evaluates to 1 if we came from \ms{start}, and to $a_1$ if we came from
\ms{body}. This allows us to express the fact that the values of $i_0, a_0$ are
control-flow dependent while still maintaining the single-definition principle
(as otherwise, we would need a new definition for $i$ overriding the old one).

Note, however that $i_1, a_0$ are defined \textit{later} in the program than the
$\phi$-nodes $i_0, a_0$, which would normally be seen as using an undefined
value. We hence need to use the rather confusing scoping rule that the variables
in a branch of a $\phi$-node must be defined \textit{when coming from that
particular block}, even if they are undefined when coming from other blocks.
This in general makes giving SSA an operational semantics quite confusing, and
with much time spent in compilers courses trying to build up the requisite
intuition.

\begin{figure}
\begin{subfigure}[t]{.5\textwidth}
Expand Down Expand Up @@ -330,6 +350,45 @@ \section{Static Single Assignment Form}
\label{fig:ten-fact-program}
\end{figure}

One solution to this issue is to transition to an alternative, but isomorphic
syntax: \textit{basic blocks with arguments}, as in Figure
\ref{fig:bba-ten-fact}.

\TODO{what is BBA; \cite{appel-ssa} and earlier paper}

\TODO{what is the isomorphism}

\TODO{focus on easier scoping and tail-call semantics}

\TODO{mismatch wrt. scoping between functional programming and \cite{appel-ssa}}

\TODO{what \textit{exactly} is scoping: dominance lore}

\begin{itemize}
\item Isomorphism is trivial: we just move the definition from the $\phi$ to the call-site
\item This makes semantics of blocks and scoping a lot more clear:
\item {\TODO{After introducing BBA, recall Appel's SSA is FP paper \cite{appel-ssa}, and then note
that there is a mismatch wrt scoping. Then explain dominance based scoping.}}
\item In SSA, to use a variable, it must be defined once we reach the use-point, or else UB
\item Variables can only be defined at exactly one point
\item Safe approximation: all paths to the use point must cross the definition point at least once
\item i.e. use point must \textit{dominate} definition point
\item Within a basic block: later variables use earlier ones
\item Between basic blocks: all paths through the cfg to the use-block must pass through the def
block
\item i.e. use-block must \textit{strictly dominate} def-block (all blocks dominate themselves,
but see "within BB" rule)
\item Examples: \begin{itemize}
\item start strictly dominates loop
\item loop strictly dominates body
\item loop does not strictly dominate loop since loop $=$ loop
\item body does not strictly dominate loop since we can get to loop via start, without passing
body
\end{itemize}
\item In general, dominance always forms a tree (explain? cite? \cite{ssa-intro})
\item Can topologically sort CFG by dominance tree, and then insert brackets for lexical scoping:
\end{itemize}

\begin{figure}
\begin{subfigure}[t]{.5\textwidth}
\centering
Expand Down Expand Up @@ -366,9 +425,12 @@ \section{Static Single Assignment Form}
\\ \\
\end{align*}
\caption{Basic-blocks with arguments}
\label{fig:bba-ten-fact}
\end{subfigure}

\TODO{Add arrows with \texttt{tikzmark}?}

\TODO{cite LLVM, MLIR, Cranelift?}

\caption{ The program in Figure \ref{fig:ten-fact-program} written in standard SSA (using $\phi$
nodes), like in LLVM, and in basic-blocks with arguments SSA, like in MLIR and Cranelift. The
Expand All @@ -378,33 +440,6 @@ \section{Static Single Assignment Form}
\Description{}
\end{figure}

\TODO{cite LLVM, MLIR, Cranelift?}

\begin{itemize}
\item Isomorphism is trivial: we just move the definition from the $\phi$ to the call-site
\item This makes semantics of blocks and scoping a lot more clear:
\item {\TODO{After introducing BBA, recall Appel's SSA is FP paper \cite{appel-ssa}, and then note
that there is a mismatch wrt scoping. Then explain dominance based scoping.}}
\item In SSA, to use a variable, it must be defined once we reach the use-point, or else UB
\item Variables can only be defined at exactly one point
\item Safe approximation: all paths to the use point must cross the definition point at least once
\item i.e. use point must \textit{dominate} definition point
\item Within a basic block: later variables use earlier ones
\item Between basic blocks: all paths through the cfg to the use-block must pass through the def
block
\item i.e. use-block must \textit{strictly dominate} def-block (all blocks dominate themselves,
but see "within BB" rule)
\item Examples: \begin{itemize}
\item start strictly dominates loop
\item loop strictly dominates body
\item loop does not strictly dominate loop since loop $=$ loop
\item body does not strictly dominate loop since we can get to loop via start, without passing
body
\end{itemize}
\item In general, dominance always forms a tree (explain? cite? \cite{ssa-intro})
\item Can topologically sort CFG by dominance tree, and then insert brackets for lexical scoping:
\end{itemize}

\begin{figure}
\centering
\begin{subfigure}[t]{.5\textwidth}
Expand Down

0 comments on commit 84038d0

Please sign in to comment.