The purpose of this proposal is to establish some basic syntactic elements of
the Carbon language and make sure that the grammar is unambiguous and can be
parsed by an LALR parser such as yacc
or bison
. The grammar presented here
has indeed been checked by bison
. The language features in this basic grammar
include control flow by way of if
and while
, functions, simple structures,
choice, pattern matching, and a sample of operators on Booleans and integers.
The main syntactic categories are declaration
, statement
, and expression
.
Establishing these syntactic categories should help the other proposals choose
syntax that is compatible with the rest of the language.
The grammar proposed here is based on the following proposals:
- Carbon language overview
- pattern matching #87,
- structs #98,
- tuples #111,
- sum types #157, and
- metaprogramming #89.
We summarize the four main syntactic categories here and define the grammar in the next section.
-
declaration
includes function, structure, and choice definitions. -
statement
includes local variable definitions, assignment, blocks,if
,while
,match
,break
,continue
, andreturn
. -
expression
includes function calls, arithmetic, literals, and other syntaxes that evaluate to a value. In Carbon, a type is a kind of value, and Carbon makes no syntactic distinction between type-valued expressions and other kinds of expressions, so theexpression
nonterminal is used in positions where a type is expected. -
pattern
for the patterns in amatch
statement, for the left-hand side of a variable definition, and for describing the parameters of a function. The grammar treats patterns and expressions identically, but the type system will only allow pattern variables (expression ':' identifier
) in patterns and not in value or type-producing expressions. Having the separate non-terminals forpattern
andexpression
helps to document this distinction.
The proposal also specifies the abstract syntax.
When this proposal is accepted, the accompanying flex
, bison
, and C++ files
that define the grammar and implement the parser actions will be placed in the
executable_semantics
directory of the carbon-lang
repository.
In the near future there will be proposals regarding the semantic analysis (aka.
type checking) and specification of runtime behavior for this subset of Carbon
with the plan to add the accompanying executable forms of those specifications
to the executable_semantics
directory.
Looking towards growing the Carbon language, the intent is for other proposals
to add to this grammar by modifying the flex
and bison
files and the
abstract syntax definitions in the executable_semantics
directory.
The following grammar defines the concrete syntax for expressions. Below we comment on a few aspects of the grammar.
pattern:
expression
;
expression:
identifier
| expression designator
| expression '[' expression ']'
| expression ':' identifier
| integer_literal
| "true"
| "false"
| tuple
| expression "==" expression
| expression '+' expression
| expression '-' expression
| expression "and" expression
| expression "or" expression
| "not" expression
| '-' expression
| expression tuple
| "auto"
| "fnty" tuple return_type
;
designator:
'.' identifier
;
tuple:
'(' field_list ')'
;
field_list:
/* empty */
| field
| field ',' field_list
;
field:
pattern
| designator '=' pattern
;
return_type:
/* empty */
| "->" expression
;
The grammar rule
expression: expression ':' identifier
is for pattern variables. For example, in a variable definition such as
var Int: x = 0;
the Int: x
is parsed with the grammar rule for pattern variables. In the
right-hand side of the above grammar rule, the expression
to the left of the
:
must evaluate to a type at compile time.
The grammar rule
expression: "fnty" tuple return_type
is for function types. They are meant to play the role that function pointers
play in C and that std::function
plays in C++.
Regarding the grammar rule for the return type of a function:
return_type: "->" expression
the expression
is expected to evaluate to a type at compile-time.
The grammar rule
tuple: '(' field_list ')'
is primarily for constructing a tuple, but it is also used for creating tuple types and tuple patterns, depending on the context in which the expression occurs.
Regarding the grammar rules for designator
and for a named argument
designator: '.' identifier
field: designator '=' pattern
The period enables the addition of new keywords to Carbon without colliding with
field names. The period also aids integrated development environments with
auto-completion. The issue is that without the period, when the programmer is
typing the identifier (and not yet typed the =
), the IDE doesn't know whether
the identifier is for a positional argument, in which case it is a variable
occurrence, or whether the identifier is a field name.
The following grammar defines the concrete syntax for statements.
statement:
"var" pattern '=' expression ';'
| expression '=' expression ';'
| expression ';'
| "if" '(' expression ')' statement "else" statement
| "while" '(' expression ')' statement
| "break" ';'
| "continue" ';'
| "return" expression ';'
| '{' statement_list '}'
| "match" '(' expression ')' '{' clause_list '}'
;
statement_list:
/* empty */
| statement statement_list
;
clause_list:
/* empty */
| clause clause_list
;
clause:
"case" pattern "=>" statement
| "default" "=>" statement
;
The following grammar defines the concrete syntax for declarations.
declaration:
"fn" identifier tuple return_type '{' statement_list '}'
| "fn" identifier tuple "=>" expression ';'
| "fn" identifier tuple return_type ';'
| "struct" identifier '{' member_list '}'
| "choice" identifier '{' alternative_list '}'
;
member:
"var" expression ':' identifier ';'
;
member_list:
/* empty */
| member member_list
;
alternative:
identifier tuple ';'
;
alternative_list:
/* empty */
| alternative alternative_list
;
declaration_list:
/* empty */
| declaration declaration_list
;
In the grammar rule for function definitions
declaration: "fn" identifier tuple return_type '{' statement_list '}'
the tuple
is used as a pattern to describe the parameters of the function. The
grammar for function definitions does not currently include
implicit parameters,
but the intent is to add them to the grammar in the future.
In the rule for field declarations
member: "var" expression ':' identifier ';'
the expression
must evaluate to a type at compile time. The same is true for
the tuple
in the grammar rule for an alternative:
alternative: identifier tuple ';'
The following precedence and associativity specification is meant to approximate
the definitions in the operators and precedence proposal
#168 to the extent
that is possible in a yacc
/bison
generated parser. The ordering is from
lowest to highest precedence, with operators on the same line having equal
precedence. Proposal 168 differs in that the operator groups are partially
ordered instead of being totally ordered.
nonassoc '{' '}'
nonassoc ':' ','
left "or" "and"
nonassoc "==" "not"
left '+' '-'
left '.' "->"
nonassoc '(' ')' '[' ']'
The output of parsing is an abstract syntax tree. There are many ways to define
abstract syntax. Here we simply use C-style struct
definitions. The definition
of the abstract syntax is important because it is used in the specification of
the semantic analysis and the specification of runtime behavior.
enum ExpressionKind { Variable, PatternVariable, Int, Bool,
PrimitiveOp, Call, Tuple, Index, GetField,
IntT, BoolT, TypeT, FunctionT, AutoT };
enum Operator { Neg, Add, Sub, Not, And, Or, Eq };
struct Expression {
ExpressionKind tag;
union {
struct { string* name; } variable;
struct { Expression* aggregate; string* field; } get_field;
struct { Expression* aggregate; Expression* offset; } index;
struct { string* name; Expression* type; } pattern_variable;
int integer;
bool boolean;
struct { vector<pair<string,Expression*> >* fields; } tuple;
struct {
Operator operator_;
vector<Expression*>* arguments;
} primitive_op;
struct { Expression* function; Expression* argument; } call;
struct { Expression* parameter; Expression* return_type;} function_type;
} u;
};
The correspondence between most of the grammar rules and the abstract syntax is
straightforward. However, the parsing of the field_list
deserves some
explanation. The fields can be labeled with the grammar rule:
field: designator '=' pattern
or unlabeled, with the rule
field: pattern
The unlabeled fields are given numbers (represented as strings) for field labels, starting with 0 and going up from left to right.
Regarding the rule for tuples:
tuple: '(' field_list ')'
if the field list only has a single unlabeled item without a trailing comma,
then the parse result for that item is returned. Otherwise a tuple
AST node is
created containing the parse results for the fields.
In the following grammar rules, if the parse result for tuple
is not a tuple
(because the field list was a single unlabeled item without a trailing comma),
then a tuple is wrapped around the expression to ensure that it is a tuple.
expression: expression tuple
expression: "fnty" tuple return_type
function_definition:
"fn" identifier tuple return_type '{' statement_list '}'
| "fn" identifier tuple DBLARROW expression ';'
| "fn" identifier tuple return_type ';'
alternative: identifier tuple ';'
enum StatementKind { ExpressionStatement, Assign, VariableDefinition,
If, Return, Sequence, Block, While, Break, Continue,
Match };
struct Statement {
StatementKind tag;
union {
Expression* exp;
struct { Expression* lhs; Expression* rhs; } assign;
struct { Expression* pat; Expression* init; } variable_definition;
struct { Expression* cond; Statement* then_; Statement* else_; } if_stmt;
Expression* return_stmt;
struct { Statement* stmt; Statement* next; } sequence;
struct { Statement* stmt; } block;
struct { Expression* cond; Statement* body; } while_stmt;
struct {
Expression* exp;
list< pair<Expression*,Statement*> >* clauses;
} match_stmt;
} u;
};
struct FunctionDefinition {
string name;
Expression* param_pattern;
Expression* return_type;
Statement* body;
};
enum MemberKind { FieldMember };
struct Member {
MemberKind tag;
union {
struct { string* name; Expression* type; } field;
} u;
};
struct StructDefinition {
string* name;
list<Member*>* members;
};
enum DeclarationKind { FunctionDeclaration, StructDeclaration,
ChoiceDeclaration };
struct Declaration {
DeclarationKind tag;
union {
struct FunctionDefinition* fun_def;
struct StructDefinition* struct_def;
struct {
string* name;
list<pair<string, Expression*> >* alts;
} choice_def;
} u;
};
Expressions, type expressions, and patterns could instead be defined using completely disjoint grammar productions, but that would require adding more keywords or symbols to avoid ambiguity and there would be a fair amount of repetition in grammar productions.
The proposal does not include declarations for uninitialized variables, leaving that to a later proposal.
In this proposal, assignment is a statement. It could instead be an expression
as it is in C and C++. The arguments against assignment-as-an-expression
include 1) it complicates reasoning about the ordering of side-effects and 2) it
can cause confusion between =
and ==
(SEI CERT C Coding Standard)
Visual Studio Warning.
The syntax for variable initialization uses =
, which is the same token used in
assignment. An alternative would be to use a different syntax such as :=
to
better signal the semantic differences between initialization and assignment.
The author does not have a preference on this choice. The :=
alternative does
not introduce any complications regarding ambiguity despite the other uses of
:
in the grammar.
Inside a choice
, an alternative
could instead begin with a keyword such as
alt
, fn
, or var
, as discussed in the proposal for sum types
#157. The author does
not have a preference in this regard.
The precedence for and
and or
are equal, following the suggestion of
proposal #168,
whereas in C++ &&
has higher precedence than ||
.
The parameters of a function are described using the syntax of a pattern. There could instead be separate syntax that is special to function parameters, as there is in C++.
There are open questions regarding the design of function types, so we could
alternatively postpone the specification of the syntax for function types. The
choice to include them is based on the idea that Carbon will need something that
fills the roles of std::function
and C-style function pointers. Also, the
features were selected for the basic syntax proposal in a way that aimed to make
the proposal complete in the sense that each value-oriented feature (functions,
tuples, structures) comes with syntax for 1) creating values, 2) using values,
and 3) a type for classifying values.
The parameters of a function type are described using the syntax for type expression. There could instead be separate syntax that is special to the parameters of a function type, as there is in C++.
The keyword for introducing a function type is fnty
, which is an arbitrary
choice. Other alternatives include fn_type
and fn
. The fn
alternative
would conflict with future plans to use fn
for lambda expressions. Some
dislike of fn_type
was voiced, which motivated the choice of fnty
. It would
be nice to do without a keyword, but as the grammar currently stands, that
introduce ambiguity.
The pattern
non-terminal is defined in terms of expression
. It could instead
be the other way around, defining expression
in terms of pattern
.
Regarding tuples and tuple types, this proposal follows
#111 in that there is
not a distinct syntax for tuple types. Instead, the intent is that when a tuple
of types results from an expression in a context that expects a type, the type
system will convert the tuple to a tuple type. An alternative approach has been
discussed in which tuple types have a distinct syntax, such as
Tuple(Int, Bool)
.
Using code to validate our specification is a really promising direction, and this proposal seems like a good starting point. A reference implementation that's simple enough to be part of the design iteration process should help us move faster, by quickly uncovering the places where our specifications are ambiguous, syntactically or semantically unsound, or don't give the behavior we expect. In other words, it will help us keep ourselves honest, even at the proposal stage, which will help us avoid wasting time and effort implementing designs that turn out to be unworkable.
This can be considered as sort of a counterpart to In-progress design overview #83, in that the design specifics are being approved in order to bootstrap the specification process. We aren't necessarily adopting the specific syntax and semantics expressed by this proposal, and those choices will need to be presented and justified from scratch by future proposals.
This decision is deferring the implementation to code review. The specific tooling used to implement the syntax checker, such as Bison, is a detail which may be changed, now or later, without requiring a proposal for core team review.