Skip to content

Sail Internals

Alasdair Armstrong edited this page May 29, 2019 · 2 revisions

Sail Internals

Abstract Syntax Tree

The Sail abstract syntax tree (AST) is defined by an ott grammar in sail.ott. This generates a OCaml (and lem) version of the ast in src/ast.ml and src/ast.lem. Technically the OCaml version of the AST is generated by Lem, which allows the Sail OCaml source to seamlessly interoperate with parts written in Lem. Although we do not make much use of this, it in principle allows us to implement parts of Sail in Lem, which would enable them to be verified in Isabelle or HOL.

The Sail AST looks something like:

and 'a exp =
 | E_aux of ( 'a exp_aux) * 'a annot

and 'a exp_aux =  (* expression *)
 | E_block of ( 'a exp) list (* sequential block *)
 | E_id of id (* identifier *)
 | E_lit of lit (* literal constant *)
 | E_cast of typ * ( 'a exp) (* cast *)
 | E_app of id * ( 'a exp) list (* function application *)
 | E_tuple of ( 'a exp) list (* tuple *)
 | E_if of ( 'a exp) * ( 'a exp) * ( 'a exp) (* conditional *)
 ...

Each constructor is prefixed by a unique code (in this case E for expressions), and is additionally wrapped by an aux constructor which attaches an annotation to each node in the AST, consisting of an arbitrary type that parameterises the AST, and a location identifying the position of the AST node in the source code:

type 'a annot = l * 'a

There are various types of locations:

type l =
  | Unknown
  | Unique of int * l
  | Generated of l
  | Range of Lexing.position * Lexing.position
  | Documented of string * l

Range defines a range of positions given by the parser, the Documented constructor attaches doc-comments to locations. Generated is used to signify that code is generated based on code from some other location. Unique is used internally to tag locations with unique integers so they can be referred to later. Unknown is used for Sail that has no obvious corresponding location, although this should be avoided as much possible as it leads to poor error messages. Ast nodes programatically generated initially have Unknown locations, but Ast_util.locate can be used to recursively attach Generated locations for error purposes.

A convention in the Sail source is that a single variable l is always a location, usually the closest location.

AST utilities

There are various functions for manipulating the AST defined in ast_util. These include constructor functions like mk_exp for generating untyped expressions and various functions for destructuring AST nodes. It also defines various useful Sets and Maps, e.g. sets of identifiers, type variables etc. Note that type variables in Sail are often internally referred to as kids, I think this is because type variables are defined as having a specific kind, i.e. 'n : Int is a type variable with kind Int. (Although kinded_id is technically a separate type which is a type variable * kind pair).

Parse AST

There is a separate AST parse_ast.ml which the parser generates. It is very similar to the main AST except it contains some additional syntactic sugar. The parse ast is desugared by the initial_check file, which performs some basic checks in addition from mapping the parse AST to the full AST.

Overall Structure

The main entry point for Sail is the file sail.ml. Each backend option e.g. -c, -lem, -ocaml etc, is referred to as a target, and the set_target function is used to set the opt_target variable, for example

  ( "-c",
    Arg.Tuple [set_target "c"; Arg.Set Initial_check.opt_undefined_gen],
    " output a C translated version of the input");

defines the -c option. Each Sail option is configured via via opt_ variables defined at the top of each relevant file. In this case we tell Sail that when we generate C, we also want to generated undefined_X functions for each type X. In general such opt_ variables should be set when we start Sail and remain immutable thereafter.

The first function called by main is Sail.load_files. This function parses all the files passed to Sail, and then concatenates their ASTs. The pre-processor is then run, which evaluates $directive statements in Sail, such as

$include <prelude.sail>

Unlike the C pre-processor the Sail pre-processor operates over actual Sail ASTs rather than strings. This can recursively include other files into the AST, as well as add/remove parts of the AST with $ifdef etc. Directives that are used are preserved in the AST, so they also function as a useful way to pass auxiliary information to the various Sail backends.

The initial check mentioned above is then run to desugar the AST, and then the type-checker is run which produces a fully type-checked AST. Type annotations are attached to every node (for which an annotation makes sense) using the aux constructors. The type-checker is discussed in more details later.

After type-checking the Sail scattered definitions are de-scattered into single functions.

All the above is shared by each target and performed by the load_files function.

The next step is to apply various target-specific rewrites to the AST before passing it to the backend for each target.

The file rewrites.ml defines a list of rewrites:

let all_rewrites = [
    ("pat_string_append", Basic_rewriter rewrite_defs_pat_string_append);
    ("mapping_builtins", Basic_rewriter rewrite_defs_mapping_patterns);
    ("mono_rewrites", Basic_rewriter mono_rewrites);
    ("toplevel_nexps", Basic_rewriter rewrite_toplevel_nexps);
    ("monomorphise", Basic_rewriter monomorphise);
    ...

and each target specifies a list of rewrites to apply like so:

let rewrites_interpreter = [
    ("no_effect_check", []);
    ("realise_mappings", []);
    ("toplevel_string_append", []);
    ("pat_string_append", []);
    ("mapping_builtins", []);
    ("undefined", [Bool_arg false]);
    ("vector_concat_assignments", []);
    ("tuple_assignments", []);
    ("simple_assignments", [])
  ]

Once these rewrites have occurred the Sail.target function is called which invokes the backend for each target, e.g. for OCaml:

  | Some "ocaml" ->
     let ocaml_generator_info =
       match !opt_ocaml_generators with
       | [] -> None
       | _ -> Some (Ocaml_backend.orig_types_for_ocaml_generator ast, !opt_ocaml_generators)
     in
     let out = match !opt_file_out with None -> "out" | Some s -> s in
     Ocaml_backend.ocaml_compile out ast ocaml_generator_info

There is also a Sail.prover_regstate function that allows the register state to be Set up in a prover-agnostic way for each of the theorem-prover targets.

Type Checker

The Sail type-checker is contained within src/type_check.ml. This file is long, but is structured as follows:

  1. The type checking environment is defined. The functions that operate on the typing environment are contained with a separate Env module. The purpose of this module is to hide the internal representation of the typing environment and ensure that the main type-checker code can only interact with it using the functions defined in this module, which can be set up to guarantee any invariant we need. Code outside the type-checker itself can only interact with an even more restricted subset of these functions exported via the mli interface file.

  2. Helper definitions for subtyping and constraint solving are set up. The code that actually talks to an external SMT solver is contained within a separate file src/constraint.ml, whereas the code here sets up the interface between the typing environment and the solver.

  3. Next comes the definitions for unification and instantiation of types. There is some additional code (3.5) to handle subtyping with existential types, which can use unification to instantiate existentially quantified type variables.

  4. Sail allows some type-level constructs to appear in term-level variables, but these are then eliminated during type-checking in a process called sizeof-rewriting, after the (somewhat-awkwardly) named sizeof keyword.

  5. Finally all the typing rules are given. Sail has a bi-directional type-checking approach. So there are checking rules like check_exp, check_pat, etc., as well as inference rules like infer_exp, infer_pat, etc.

  6. Effects added by the previous typing rules are now propagated upwards through the AST

  7. Finally we have rules for checking and processing toplevel definitions of functions and datatypes.

The interface by which the rest of Sail can interact with the type-checker and type annotations is strictly controlled by it's mli interface file. As much of the type-checking internals is kept as abstract as possible.

Rewriter

The rewriting framework used by the various rewrites mentioned previously is defined in src/rewriter.ml. It contains various large structs with functions for every single AST node, and allows data to be threaded through each re-write in various ways. Most of the re-writes are defined in src/rewrites.ml, although the re-writer is used for other rewrite like passes such as e.g. constant folding in src/constant_fold.ml which combines the rewriter with the Sail interpreter.

The rewriter also acts as the interface to the Sail monomorphisation code, in src/monomorphise.ml.

Jib

The C and SMT backends for Sail use a custom intermediate representation (IR) called Jib (it's a type of Sail). Like the full AST this is defined as an ott grammar in language/jib.ott. The sail "-ir" target allows Sail to be converted into this IR without any further processing.

The Jib related files are contained within a sub-directory src/jib/. First we convert Sail to A-normal form (ANF) in src/jib/anf.ml, then src/jib/jib_compile.ml converts this into Jib.

The Jib representation has the advantage of being much simpler than the full Sail AST, and it removes a lot of the typing complexity, as types are replaced with simpler ones (ctyp). Note that many Jib-related types are prefixed by c as it was originally only used when generating C.

The key optimisation we do when generating Jib is we analyse the Sail types using SMT to try to fit arbitrary-precision types into smaller fixed-precision machine-word types that exist within Jib. To aid in this we have a specialisation pass that removes polymorphism by creating specialised copies of functions based on how their type-quantifiers are instantiated. This can be used in addition to the Sail monomorphisation above.

Once we have generated Jib, the code generator from Jib into C is fairly straightforward.

Jib to SMT translation

Starting with some Sail such as:

default order dec

$include <prelude.sail>

register r : bits(32)

$property
function property(xs: bits(32)) -> bool = {
  ys : bits(32) = 0x1234_5678;
  if (r[0] == bitzero) then {
    ys = 0xffff_ffff
  } else {
    ys = 0x0000_0000
  };
  (ys == sail_zeros(32) | ys == sail_ones(32))
}

We first compile to Jib, then inline all functions and flatten the resulting code into a list of instructions as below. The Sail->Jib step can be parameterised in a few ways so it differs slightly to when we compile Sail to C. First, a specialisation pass specialises integer-polymorphic functions and builtins, which is reflected in the name mangling scheme so, e.g.

zz7mzJzK0zCz0z7nzJzK32#bitvector_access

is a specialised version of bitvector access for 'n => 32 & 'm => 0. This lets us generate optimal SMT operations for monomorphic code, as the SMTLIB operations like ZeroExtend and Extract are only defined for natural number constants and bitvectors of known lengths. We also have to treat zero-length bitvectors differently to C, as SMT does not allow zero-length bitvectors, and unlike when we compile to C, we can have fixed-precision bitvectors of greater that 64-bits in the generated Jib.

var ys#u12_l#9 : fbits(32, dec)
ys#u12_l#9 : fbits(32, dec) = UINT64_C(0x12345678)
var gs#2#u12_l#15 : bool
var gs#1#u12_l#17 : bit
gs#1#u12_l#17 : bit = zz7mzJzK0zCz0z7nzJzK32#bitvector_access(R, 0l)
gs#2#u12_l#15 : bool = eq_bit(gs#1#u12_l#17, UINT64_C(0))
kill gs#1#u12_l#17 : bit
var gs#6#u12_l#16 : unit
jump gs#2#u12_l#15 then_13
ys#u12_l#9 : fbits(32, dec) = UINT64_C(0x00000000)
gs#6#u12_l#16 : unit = UNIT
goto endif_14
then_13:
ys#u12_l#9 : fbits(32, dec) = UINT64_C(0xFFFFFFFF)
gs#6#u12_l#16 : unit = UNIT
endif_14:
kill gs#2#u12_l#15 : bool
var gs#5#u12_l#10 : bool
var gs#3#u12_l#14 : fbits(32, dec)
gs#3#u12_l#14 : fbits(32, dec) = zz7nzJzK32#sail_zeros(32l)
gs#5#u12_l#10 : bool = zz7nzJzK32#eq_bits(ys#u12_l#9, gs#3#u12_l#14)
kill gs#3#u12_l#14 : fbits(32, dec)
var gs#7#u12_l#11 : bool
jump gs#5#u12_l#10 then_11
var gs#4#u12_l#12 : fbits(32, dec)
var gs#0#u9_l#13 : fbits(32, dec)
gs#0#u9_l#13 : fbits(32, dec) = zz7nzJzK32#sail_zeros(32l)
gs#4#u12_l#12 : fbits(32, dec) = zz7nzJzK32#not_vec(gs#0#u9_l#13)
kill gs#0#u9_l#13 : fbits(32, dec)
goto end_inline_10
end_inline_10:
gs#7#u12_l#11 : bool = zz7nzJzK32#eq_bits(ys#u12_l#9, gs#4#u12_l#12)
kill gs#4#u12_l#12 : fbits(32, dec)
goto endif_12
then_11:
gs#7#u12_l#11 : bool = true
endif_12:
return : bool = gs#7#u12_l#11
kill gs#5#u12_l#10 : bool
kill ys#u12_l#9 : fbits(32, dec)
end
undefined bool

The above Jib is then turned into a control-flow-graph in SSA form.

SSA graph

The conditionals that affect control-flow are put into separate nodes (in yellow), so we can easily compute the global path conditional for each block (stored in a function pi(cond0, ... , condn) by using the yellow conditional nodes between each node and the start node.

From this form the conversion to SMT is as follows:

A variable declaration such as

var x : fbits(32, dec)

would become

(declare-const x (BitVec 32))

A call to a builtin

x : T = f(y0, ... , yn)

would become

(define-const x T' exp)

where exp encodes the builtin f using SMT bitvector operations

Phi functions are mapped to muxers as follows - for a function phi(x0,...,xn) we turn that into an if-then-else statement which selects x0 to xn based on the global path conditionals of the parent blocks corresponding to each argument. Each phi function in a block always has the same number of arguments as the number of parent nodes, and the arguments are in the same order as the node index for each parent.

The above scheme generates a lot of declare-const and define-const statements that may not be needed so we do some simple dead-code elimination and constant propagation, which results in the following SMT:

(set-logic QF_AUFBVDT)
(declare-const zR/0 (_ BitVec 32))
(define-const zysz3u12_lz30/5 (_ BitVec 32) (ite (not (= ((_ extract 0 0) zR/0) #b0)) #x00000000 #xFFFFFFFF))
(assert (and (not (ite (not (= zysz3u12_lz30/5 #x00000000)) (= zysz3u12_lz30/5 (bvnot #x00000000)) true))))
(check-sat)

For monomorphic-bitvector manipulating code we can generate very compact SMT. Both specialisation and monomorphisation can be used to help monomorphise bitvectors. For variable-length bitvectors we can represent them as length-bitvector pairs, up to a certain bounded max length (default 256). This is less efficient but unavoidable in certain places. Integers are currently mapped to either 128 bit bitvectors (or any configurable max length) or 64 bit bitvectors.

The one slightly tricky thing to support is register references, e.g.

(*r) : T = 0xFFFF_FFFF

where r is a register reference. For this we look for all registers with type T (as a Jib type, where type-equality is trivial), and convert the above to

if r = ref R1 then
  R1 = 0xFFFF_FFFF
else if r = ref R2 then
  R2 = 0xFFFF_FFFF
else ...

if we did some smarter constant folding (e.g. propagating the GPRs letbinding in some of our specs) this could potentially generate SMT that is just as good as a manual if-then-else implementation of the register read/write functions. This step is done before converting to SSA so each register in the if-then-else cascade gets the correct indices.