Skip to content

Syntax Overview

Shigma edited this page Dec 7, 2018 · 3 revisions

The page introduces the basic syntax of the wolfram language and explains the structure of the syntax files. src/syntaxes/simplest.yaml is a direct implementation of this page.

Note: the syntax definition uses some YAML tags which can be found in build/types.

Glossary

There are some basic concepts in this overview. These regular expressions are called variables and will be auto inserted into the syntax files through Mustache in the building process.

  • alnum: [0-9a-zA-Z]
  • number: (?:\d+\.?|\.\d)\d*
  • symbol: [$a-zA-Z]+[$0-9a-zA-Z]*

Basic Patterns

A simplest syntax definition for Wolfram Language support the following syntax:

Shebang

See here for the shebang definition. It's easy to support such a syntax: \A(#!).*(?=$).

Numbers

In Wolfram Language, numbers can:

  • have base: 2^^10, 11^^a.a
  • have precision: 2`10, 11`
  • have accuracy: 2``10, 11``
  • in scientific form: 2*^10, 2*^-1.1

So a complete syntax for number should be:

(?x)
(?:
  ([1-9]\d*\^\^)                                  # base
  ((?:{{alnum}}+\.?|\.{{alnum}}){{alnum}}*)       # value
  |
  ({{number}})                                    # value
)
(?:
  (\`\`(?:{{number}})?)                           # accuracy
  |
  (\`(?:{{number}})?)                             # precision
)?
(\*\^[+-]?{{number}})?                            # exponent

Note: ^^, `, `` and *^ should not be treated as operators.

Reference: Input Syntax.

Strings

A string in Wolfram Language must be quoted in a pair of " and can have the following special syntaxes:

Named Characters

Some special characters may have their names, and can be matched with \\\[{{alnum}}+\].

Note: not every \\\[{{alnum}}+\] is corrent grammar, but the simplest syntax definition does not provides a list of supported names.

Escaped Characters

In Wolfram Language, some charcters can be "escaped" while others cannot. Try the following code on Mathematica:

Reap[
    Scan[
        Sow[#, Quiet @ Check[Length @ Characters @ ToExpression["\"\\" <> # <> "\""], -1]] &,
        CharacterRange[33, 126]
    ],
    _,
    #1 -> StringJoin[#2] &
] // Last

You can obtain the following result:

  • disappeared: <>
  • unchanged: #$',-89;=?]{|}~
  • escaped: !"%&()*+/@\^_`bfnrt
  • errored: other characters

The first three kinds of characters can be placed after a \ while characters from the last kind cannot.

Encoded Characters

The Wolfram Language also supports characters with encoding:

  • 3-digits octal: \\[0-7]{3}
  • 2-digits hexadecimal: \\\.[0-9A-Fa-f]{2}
  • 4-digits hexadecimal: \\:[0-9A-Fa-f]{4}

Note: a string which begins with a \, \. or \: and followed by at least one number (or hexdecimal) character but don't matched with the syntax above is illegal.

Embedded Box Forms

A string can also include box forms which will be introduced later on. But in the simplest syntax, box forms in string will not be supported.

References:

Operators

There are so many operators in Wolfram Language! But syntax definitions for them is easy to write. You only need to check them out and write them in a proper sequence. I divided them into 15 categories:

Replace:
  /.    Replace
  //.   ReplaceAll

Call:
  @     Prefix
  @@    Apply
  @@@   Apply
  /@    Map
  //@   MapAll
  //    Postfix
  ~     Infix
  @*    Composition
  /*    RightComposition

Comparison:
  >     Greater
  <     Less
  >=    GreaterEqual
  <=    LessEqual
  ==    Equal
  !=    Unequal
  ===   SameQ
  =!=   UnsameQ

Logical:
  !     Not
  ||    Or
  &&    And

Assignment:
  =     Set
  :=    SetDelayed
  ^=    UpSet
  ^:=   UpSetDelayed
  /:    TagSet (TagUnset, TagSetDelayed)
  =.    Unset
  +=    AddTo
  -=    SubtractFrom
  *=    TimesBy
  /=    DivideBy

Rule:
  ->    Rule
  :>    RuleDelayed
  <->   TwoWayRule

Condition:
  /;    Condition

Repeat:
  ..    Repeated
  ...   RepeatedNull

Arithmetic:
  +     Plus
  -     Minus, Subtract
  *     Multiply
  /     Devide
  ^     Power
  .     Dot
  ++    Increment, PreIncrement
  --    Decrement, PreDecrement

Flow:
  <<    Get
  >>    Put
  >>>   PutAppend

String:
  <>    StringJoin
  ~~    StringExpression

Span:
  ;;    Span

Compound:
  ;     CompoundExpression

Function:
  &     Function

Definition:
  ?     Definition
  ??    FullDefinition

Note: Some operators may not be included in the list if they are declared in other scopes.

Also, named characters can also be recognized as operators.

Reference: Operators.

Variables

A general variable is some symbols joined with some ` (a symbol before a ` is called "context").

match: (`?(?:{{symbol}}`)*){{symbol}}
name: variable.other.wolfram
captures: !raw
  1: variable.other.context.wolfram

Functions

Functions have no difference with variables in Wolfram Language. But we should color them more like functions in a syntax definition. Here are some basic way to identify a function:

  • an variable placed before (@{1,3}|//?@|[/@]\*)
  • an variable placed after (//|[@/]\*)
  • an variable placed on an even order in some expressions joined with some ~
  • an variable placed after a PatternTest (which was introduced in the next part)

Patterns

Apart from functions, patterns have two forms:

  1. in the shorthand form of pattern, that is a variable before :(?=[^:>=])
  2. in the shorthand form of blank and default, that is a variable before
(?x)
(_\.)               # Default
|
(_{1,3})            # Blank, BlankSequence, BlankNullSequence
({{identifier}})?   # Head (here "identifier" means variable)

After a pattern, there may be some additional syntaxes other than expressions:

  • Optional: :
  • PatternTest: ?

However, how to color them properly is of great difficulty, and is not supposed to be discussed here.

Bracketing

There are many kinds of bracketing in the Wolfram Language. A general bracketing rule should be like this:

begin: \\(
beginCaptures: !all punctuation.section.parens.begin.wolfram
end: \\)
endCaptures: !all punctuation.section.parens.end.wolfram
name: meta.parens.wolfram
patterns: !push expressions

In a simplest syntax declaration, we only need to support the following bracketing:

  • parens: ( and )
  • braces: { and }
  • brackets: [ and ]
  • association: <| and |>
  • parts: [[ and ]]
  • box: \( and \)

Reference: The Four Kinds of Bracketing in the Wolfram Language.

Box Forms

Box forms is a nested scope with all expression rules and some special syntaxes:

  • \\` : FormBox
  • \\@: SqrtBox
  • \\/: FractionBox
  • \\[%&+_^]: x-scriptBox (x can be Sub/Super/Over/Under/...)
  • \\\*: box constructors

Reference: String Representation of Boxes.

Comment blocks

A comment block is wrapped in a pair of (* and *):

begin: \(\*
end: \*\)
patterns: !push comment-block

Note: in the inner scope of a comment block, the rule itself must be included because the following syntax is legal in Wolfram Language and can be found in some .wl files:

(* ::Input:: *)
(*(* some *)
(* comments *)*)

Shorthand expressions

There are also some syntaxes which corresponds to a function but cannot be simply treated as operators.

Reference: Wolfram Language Syntax.

Escaping before newlines

Finally, if a back-slash (\\\r?\n) is placed before a newline, it will eacape the newline.

Clone this wiki locally