-
Notifications
You must be signed in to change notification settings - Fork 19
Example: An Expression Language
Let us consider this Xtext implementation of an Expressions language (similar to the Arithmetic example shipped with Xtext)
grammar it.xsemantics.example.expressions.Expressions
with org.eclipse.xtext.common.Terminals
generate expressions "http://xsemantics.sf.net/example/expressions/Expressions"
Model:
(variables+=Variable)*
;
Variable:
name=ID '=' expression=Expression
;
Expression:
BooleanExpression;
BooleanExpression returns Expression:
Comparison
(({AndOrExpression.left=current} op=("||"|"&&")) right=Comparison)*;
Comparison returns Expression:
Equals
(({Comparison.left=current} op=("<") ) right=Equals)*;
Equals returns Expression:
Addition
(({Equals.left=current} op=("==") ) right=Addition)*;
Addition returns Expression:
Multiplication
(({Plus.left=current} '+' | {Minus.left=current} '-')
right=Multiplication)*;
Multiplication returns Expression:
Prefixed (({MultiOrDiv.left=current} op=("*"|"/")) right=Prefixed)*;
Prefixed returns Expression:
{BooleanNegation} =>"!" expression=Atomic | /* right associativity */
{ArithmeticSigned} =>"-" expression=Atomic | /* right associativity */
Atomic;
Atomic returns Expression:
'(' Expression ')' |
{NumberLiteral} value=INT |
{StringLiteral} value=STRING |
{BooleanLiteral} value=('true'|'false') |
{VariableReference} ref=[Variable]
;
// the types will be used only internally by the type system
Type:
{IntType} 'int' |
{StringType} 'string' |
{BooleanType} 'boolean'
;
With this language you can write untyped variable declarations assigning an expression. In this language, expressions can refer to variables declared before the current variable; the scoping will take care of this.
Note that, for the sake of simplicity, we do not consider a full expression grammar. For instance, we consider consider only the <
comparison operator. This is to keep things simple and focus on implementing a type system and an interpreter with Xsemantics.
We have our grammar. Now, we would like to implement these checks:
- the
-
/
*
binary operators and the-
unary operator should only act on numeric expressions -
+
can act both on numeric expressions and on strings; if one of the operand is a string, the whole expression should be a string - the
==
operator acts only on expressions of the same type - the
<
operator acts only on numeric and string expressions, and the two subexpressions must be of the same type - the
&&
||
binary operators and the ! unary operator act only on boolean expressions
So let's start writing the static semantics for the expressions language, i.e., the type system.
Note that in the grammar we also have a definition for Type
, even though that rule will never be used in the grammar. We designed it this way so that we will have also the EClass
definitions for the types that will be used by the type system. Of course, you could simply define the hierarchy for types using plain Java
.
Since we will extend the semantics of this language, we will start with a first implementation.
system it.xsemantics.example.expressions.typing.ExpressionsSemantics
validatorExtends it.xsemantics.example.expressions.validation.AbstractExpressionsJavaValidator
import it.xsemantics.example.expressions.expressions.*
judgments {
type |- Expression expression : output Type
error "cannot type " + stringRep(expression)
source expression
// more judgments later
}
So we define a judgment that we call type
which takes an Expression
as input parameter and provides a Type
as output.
axiom NumeralLiteral
G |- NumberLiteral num :
ExpressionsFactory::eINSTANCE.createIntType
axiom BooleanLiteral
G |- BooleanLiteral bool :
ExpressionsFactory::eINSTANCE.createBooleanType
axiom StringLiteral
G |- StringLiteral str :
ExpressionsFactory::eINSTANCE.createStringType
For the constants, we can write axioms, which will simply return an instance of the corresponding type.
rule MultiOrDiv
G |- MultiOrDiv multiOrDiv : IntType intType
from {
G |- multiOrDiv.left : intType
G |- multiOrDiv.right : intType
}
rule Minus
G |- Minus minus :
ExpressionsFactory::eINSTANCE.createIntType
from {
var IntType intType
G |- minus.left : intType
G |- minus.right : intType
}
The rules for multiplication, division and minus are similar. They state that the result type is an IntType
, provided that the subexpressions have themselves an an IntType
. They do this in two different (equivalent) ways.
The first one states that the result is an IntType
which is the result of the typing of the left and right subexpressions. Remember that for this judgment, the second parameter is an output parameter. Basically, the second rule does the same, in a more explicit way, and it does not bind the output result (which is created on the fly) to the result of typing of the subexpressions (which, however, are still required to have integer type, since we pass to the rule invocation an IntType
as output argument).
rule Plus
G |- Plus plus : Type type
from {
G |- plus.left : var Type leftType
G |- plus.right : var Type rightType
{
(leftType instanceof StringType || rightType instanceof StringType)
type = ExpressionsFactory::eINSTANCE.createStringType
}
or
{
(leftType instanceof IntType && rightType instanceof IntType)
type = leftType // i.e., IntType
}
}
Remember that for the +
operator we are willing to deal both with strings (string concatenation) and with integers (standard arithmetic sum). If one of the two subexpressions has StringType, then we consider the whole plus expression as a string concatenation and we give it type string. As expected, if one of the subexpressions has boolean type, the expression will fail this rule.
Note that we use the or
operator. See this section about premises <> in the reference for its semantics.
rule Comparison
G |- Comparison comparison : ExpressionsFactory::eINSTANCE.createBooleanType
from {
empty |- comparison.left : var Type leftType
empty |- comparison.right : var Type rightType
// cannot compare booleans
(leftType instanceof IntType && rightType instanceof IntType) ||
(leftType instanceof StringType && rightType instanceof StringType)
}
rule Equals
G |- Equals comparison : ExpressionsFactory::eINSTANCE.createBooleanType
from {
G |- comparison.left : var Type leftType
G |- comparison.right : var Type rightType
// can compare only if they have the same type
(leftType.eClass == rightType.eClass)
}
The code for Comparison
and Equals
are similar, in that they both return a boolean type as a result. But while equality can be checked with any type of expressions (provided the subexpressions have the same type), the comparison does not consider boolean subexpressions.
rule BooleanNegation
G |- BooleanNegation negation : BooleanType boolType
from {
G |- negation.expression : boolType
}
rule AndOr
G |- AndOrExpression andOr : BooleanType boolType
from {
G |- andOr.left : boolType
G |- andOr.right : boolType
}
rule ArithmeticSigned
G |- ArithmeticSigned signed : ExpressionsFactory::eINSTANCE.createIntType
from {
G |- signed.expression : var IntType intType
}
The rules for boolean negation, logical &&
and ||
and the unary arithmetic signed operations should be clear from the previous examples.
We need one more rule for typing expressions, in particular the one for VariableReference
, whose type is the type of the referred variable. But what is the type of a variable? The idea is that we infer the type of a variable from the type of its expression.
Since Variable
is not an Expression
in our grammar, we need to introduce another judgment in our type system.
Side Note: an alternative would have been to introduce in the grammar a rule introducing a common superclass for both Variable
and Expression
, e.g., Typable
, and write the type judgment with the first parameter of type Typable
.
judgments {
... // as above...
vartype ||- Variable variable : output Type
error "cannot type " + stringRep(variable)
source variable
feature ExpressionsPackage::eINSTANCE.variable_Expression
}
Note that the new judgment has a different symbol w.r.t. the type
judgment. Moreover, when specifying the error for such judgments we specify that the EObject
for the error marker is the variable, but the feature for the error marker is the variable's expression. This will generate a better error marker.
rule VariableReference
G |- VariableReference varRef : Type type
from {
G ||- varRef.ref : type
}
rule Variable
G ||- Variable variable : Type type
from {
variable.expression != null
G |- variable.expression : type
}
Now, the type rule for VariableReference
relies on the judgment vartype
, which states that the type of a variable is the type of its expression.
IMPORTANT: expressions can refer to other variables, and in case of mutual dependencies like:
i = j + 10
j = i *5
the typing would enter an infinite loop! For this reason, for this example language we limited the scope of variables: expressions can refer only to variables defined before the current expression.
Now that we have our type system for expressions we can write validation rules (see the section CheckRules<> in the reference):
checkrule CheckVariable for
Variable variable
from {
empty ||- variable : var Type type
}
This simply checks that we can give a Variable a type (in an empty environment).
Please refer to GeneratedValidator <>, in particular validatorExtends<>, to see how to use the generated Java validator, which we also followed in the implementation of this Expressions example.
Xsemantics can be used also to write interpreter rules, which can be seen also as the operational semantics of the language.
For the Expressions language we thus introduce another judgment kind:
judgments {
... // as above...
interpret |- Expression expression ~> output Object
}
Note that we assume that we will run the interpreter ONLY on well-typed Expressions program.
We then start writing the rules for this new judgment
axiom InterpretNumberLiteral
G |- NumberLiteral number ~> number.value
axiom InterpretStringLiteral
G |- StringLiteral string ~> string.value
axiom InterpretBooleanLiteral
G |- BooleanLiteral bool ~> Boolean::valueOf(bool.value)
The rules for literals are pretty straightforward; note only the case for boolean literals: since the value feature is a string, we need to convert it into an actual boolean.
rule InterpretMinus
G |- Minus plus ~> Integer result
from {
G |- plus.left ~> var Integer leftResult
G |- plus.right ~> var Integer rightResult
result = leftResult.intValue - rightResult.intValue
}
rule InterpretMultiOrDiv
G |- MultiOrDiv multiOrDiv ~> Integer result
from {
G |- multiOrDiv.left ~> var Integer leftResult
G |- multiOrDiv.right ~> var Integer rightResult
if (multiOrDiv.op == '*')
result = leftResult.intValue * rightResult.intValue
else
result = leftResult.intValue / rightResult.intValue
}
rule InterpretArithmeticSigned
G |- ArithmeticSigned signed ~> Integer result
from {
G |- signed.expression ~> var Integer expResult
result = -(expResult)
}
The above rules rely on the fact that the result of these subexpressions is an Integer
.
rule InterpretAndOr
G |- AndOrExpression andOr ~> Boolean result
from {
G |- andOr.left ~> var Boolean leftResult
G |- andOr.right ~> var Boolean rightResult
if (andOr.op == "&&")
result = leftResult.booleanValue && rightResult.booleanValue
else
result = leftResult.booleanValue || rightResult.booleanValue
}
rule InterpretBooleanNegation
G |- BooleanNegation neg ~> Boolean result
from {
G |- neg.expression ~> var Boolean expResult
result = !expResult
}
Similarly in case of boolean expressions the result of subexpressions is assumed to be a Boolean.
rule InterpretComparison
G |- Comparison comparison ~> Boolean result
from {
empty |- comparison.left ~> var Object leftResult
empty |- comparison.right ~> var Object rightResult
if (leftResult instanceof String && rightResult instanceof String) {
result = leftResult.toString < rightResult.toString
} else {
// both are int if the expression is well-typed
result = (leftResult as Integer) < (rightResult as Integer)
}
}
rule InterpretEquals
G |- Equals comparison ~> Boolean result
from {
empty |- comparison.left ~> var Object leftResult
empty |- comparison.right ~> var Object rightResult
result = leftResult.toString == rightResult.toString
}
For comparison, the result will be a Boolean; note that we check whether the subexpressions are strings to convert them to a string representation and compare such representation; the only other possible case is that they are both integer (remember, we will only call the interpreter on well-typed programs). Note that for equality we simply convert the result of subexpressions to string and compare the resulting string representation.
rule InterpretPlus
G |- Plus plus ~> Object result
from {
G |- plus.left ~> var Object leftResult
G |- plus.right ~> var Object rightResult
if (leftResult instanceof String || rightResult instanceof String) {
var leftString = leftResult.toString
var rightString = rightResult.toString
result = leftString + rightString
} else {
// both are int if the expression is well-typed
var leftInt = leftResult as Integer
var rightInt = rightResult as Integer
result = leftInt + rightInt
}
}
rule InterpretVariableRefenrence
G |- VariableReference varRef ~> Object result
from {
G |- varRef.ref.expression ~> result
}
At this point, the remaining above interpreter rules should be straightforward.
For the Expressions language we provide a custom implementation of StringRepresentation (see also section StringRepresentation):
public class ExpressionsStringRepresentation extends StringRepresentation {
protected String stringRep(String s) {
return "'" + s + "'";
}
}
This way when a String has to be represented by Xsemantics generated code, it will add quotes around it; this way we are sure that we are dealing with an actual string.
Of course, we provide the guice binding for this:
public class ExpressionsRuntimeModule extends
it.xsemantics.example.expressions.AbstractExpressionsRuntimeModule {
public Class<? extends StringRepresentation> bindStringRepresentation() {
return ExpressionsStringRepresentation.class;
}
...
}
Now, let's complicate things a bit, and make the system for our Expressions language more powerful, with this new requirement: we want to be able to implicitly convert string literals to numbers and booleans when this is possible. This way, if an expression requires the subexpressions to be numbers or booleans we can also accept string literals if they can be convertible. Note that we intentionally avoided to consider the number literal 0
as false
and a number literal different from 0
as true
, but you can extend the system further if you want. Thus, we want the following expressions to be valid (and interpreted accordingly with an implicit conversion):
i = 20 - '5' // OK! can be given int type and evaluates to 15
b = (i > 20) || 'true' // OK! can be given boolean type
// and evaluates to true
Note that in case no expectation on types is expressed, the system will be behave as before:
i = 20 + '5' // has string type and evaluates to '205'
j = (20 * 1) + '5' // has string type again
k = 20 * (1 + '5') // has int type, since * requires int types
// on subexpressions
Since this more involved system will have most rules in common with the first system we presented, we will write this new system as an extended system (section SystemExtension), starting from the previous one:
system it.xsemantics.example.expressions.typing.ExtendedExpressionsSemantics
extends it.xsemantics.example.expressions.typing.ExpressionsSemantics
Thus, we introduce a new judgment in our system (the other ones are simply inherited from the super system):
judgments{
coerce |~ Expression expression |> Type expectedType
error "cannot convert " + stringRep(expression) +
" to type " + stringRep(expectedType)
source expression
}
We then have to define the rules for this new judgment:
rule StringToInt
G |~ StringLiteral string |> IntType type
from {
Integer::parseInt(string.value)
}
rule StringToBool
G |~ StringLiteral string |> BooleanType type
from {
string.value.equalsIgnoreCase("true") ||
string.value.equalsIgnoreCase("false")
}
Then, this judgment will be used by the other rules; in particular, we need to modify some rules we have seen so far.
The idea is that, as hinted above, that rules that expect a subexpression to be of a specific type should "communicate" this expectation through the rule environment (refer to section Environment), by mapping the string 'expected'
to the expected type.
We will need then to override some rules of the previous system: the rule for StringLiteral
, since its type now depends on the above expectation:
override rule StringLiteral
G |- StringLiteral str : Type resultType
from {
{
val expected = env(G, 'expected', Type)
G |~ str |> expected
resultType = expected
}
or
resultType = ExpressionsFactory::eINSTANCE.createStringType
}
The first block of the or
can fail due to the fact that
- No expectation was requested or
- the expectation cannot be satisfied
In any other case the type will be string
(which might make invoking rules fail).
Now we need to override those rules that have expectations on subexpressions, in order to pass to rule invocation an appropriate environment:
override rule MultiOrDiv
G |- MultiOrDiv multiOrDiv : IntType intType
from {
intType = ExpressionsFactory::eINSTANCE.createIntType
G, 'expected' <- intType |- multiOrDiv.left : intType
G, 'expected' <- intType |- multiOrDiv.right : intType
}
override rule AndOr
G |- AndOrExpression andOr : BooleanType boolType
from {
boolType = ExpressionsFactory::eINSTANCE.createBooleanType
G, 'expected' <- boolType |- andOr.left : boolType
G, 'expected' <- boolType |- andOr.right : boolType
}
We only showed two examples (the other rules for expected int and boolean types are updated accordingly).
Note that in cases where no expectation is required, we now need to pass an empty environment when typing subexpressions, otherwise we might reject well-typed expressions due to expectations that were passed to the current rule upon invocation. This however was already implemented in the super system, so we don't need to override those rules.
The rule for Plus now is a little bit more complex, since it has to take care of possible expectations:
override rule Plus
G |- Plus plus : Type type
from {
G |- plus.left : var Type leftType
G |- plus.right : var TyThe -e option is not standard. A POSIX-compliant echo does not accept any options:pe rightType
{
val expected = env(G, 'expected', Type)
G |~ plus.left |> expected
G |~ plus.right |> expected
type = expected
}
or
{
(leftType instanceof StringType || rightType instanceof StringType)
type = ExpressionsFactory::eINSTANCE.createStringType
}
or
{
(leftType instanceof IntType && rightType instanceof IntType)
type = leftType // i.e., IntType
}
}
Now we need to adapt also some interpretation rules to take into consideration expectations:
override rule InterpretStringLiteral
G |- StringLiteral string ~> Object result
from {
var Type expected
{
expected = env(G, 'expected', IntType)
result = Integer::parseInt(string.value)
}
or
{
expected = env(G, 'expected', BooleanType)
result = Boolean::parseBoolean(string.value)
}
or
result = string.value
}
override rule InterpretMultiOrDiv
G |- MultiOrDiv multiOrDiv ~> Integer result
from {
var IntType intType = ExpressionsFactory::eINSTANCE.createIntType
G, 'expected' <- intType |- multiOrDiv.left ~> var Integer leftResult
G, 'expected' <- intType |- multiOrDiv.right ~> var Integer rightResult
if (multiOrDiv.op == '*')
result = leftResult.intValue * rightResult.intValue
else
result = leftResult.intValue / rightResult.intValue
}
Again, we only showed some modified rules; now it should be clear what other rules need to be adapted and how.
All the other rules which do not require adaption will be simply inherited from the super system.
The Expressions language we used in this example does not implement any real code generation. However, we implemented a generation strategy just for the sake of demonstration on how to use the generated Java code and the trace utilities.
The generator will generate an .output
file which shows the type for each variable, the interpretation of its expression, together with the application traces (as strings) both for the type and the interpretation. The relevant part of the generator (written in Xtend2) is shown:
class ExpressionsGenerator implements IGenerator {
@Inject ExpressionsSemantics semantics
@Inject extension TraceUtils
@Inject extension StringRepresentation
// skipped some code
def compileExpression(Expression exp) {
val typeTrace = new RuleApplicationTrace()
val interpreterTrace = new RuleApplicationTrace()
val type = semantics.type(null, typeTrace, exp)
val result = semantics.interpret(null, interpreterTrace, exp)
'''
type: «type.value.string»
type trace: «typeTrace.traceAsString»
interpretation: «result.value.string»
interpretation trace: «interpreterTrace.traceAsString»
'''
}
}
ExpressionsSemantics
is the Java class generated by Xsemantics from the system we wrote for the Expressions language.
Actually, we want to use the extended system (section MoreInvolvedExpressionsSystem), thus we will bind ExpressionsSemantics
to ExtendedExpressionsSemantics
in the runtime module:
public class ExpressionsRuntimeModule extends
it.xsemantics.example.expressions.AbstractExpressionsRuntimeModule {
...
public Class<? extends ExpressionsSemantics> bindExpressionsSemantics() {
return ExtendedExpressionsSemantics.class;
}
}
Note that string
and traceAsString
come from StringRepresentation
and TraceUtils
respectively, and they are used as extension methods.
Thus, starting from this expression:
myString = 'foo' + 10 + false
we get this output:
Variable: myString
type: StringType
type trace:
Plus: [] |- 'foo' + 10 + false : StringType
Plus: [] |- 'foo' + 10 : StringType
StringLiteral: [] |- 'foo' : StringType
NumeralLiteral: [] |- 10 : IntType
BooleanLiteral: [] |- false : BooleanType
interpretation: 'foo10false'
interpretation trace:
InterpretPlus: [] |- 'foo' + 10 + false ~> 'foo10false'
InterpretPlus: [] |- 'foo' + 10 ~> 'foo10'
InterpretStringLiteral: [] |- 'foo' ~> 'foo'
InterpretNumberLiteral: [] |- 10 ~> 10
InterpretBooleanLiteral: [] |- false ~> false
Note that the indentation represents the stack of invoked rules. The quotes around string literals come from our customization of StringRepresentation, see section ExpressionsCustomization. Moreover [] represents an empty environment.
A more involved example including variable references and implicit coercions is:
intVar = 1
intVar2 = 2 * (intVar + '1')
which produces the following output
Variable: intVar2
type: IntType
type trace:
MultiOrDiv: [] |- 2 * (intVar + '1') : IntType
NumeralLiteral: ['expected' <- IntType] |- 2 : IntType
Plus: ['expected' <- IntType] |- intVar + '1' : IntType
VariableReference: ['expected' <- IntType] |- intVar : IntType
Variable: ['expected' <- IntType] ||- intVar = 1 : IntType
NumeralLiteral: ['expected' <- IntType] |- 1 : IntType
StringLiteral: ['expected' <- IntType] |- '1' : IntType
StringToInt: ['expected' <- IntType] |~ '1' |> IntType
interpretation: 4
interpretation trace:
InterpretMultiOrDiv: [] |- 2 * (intVar + '1') ~> 4
InterpretNumberLiteral: ['expected' <- IntType] |- 2 ~> 2
InterpretPlus: ['expected' <- IntType] |- intVar + '1' ~> 2
InterpretVariableRefenrence: ['expected' <- IntType] |- intVar ~> 1
InterpretNumberLiteral: ['expected' <- IntType] |- 1 ~> 1
InterpretStringLiteral: ['expected' <- IntType] |- '1' ~> 1