-
Notifications
You must be signed in to change notification settings - Fork 4
Design
Rather than breaking the language down in to the traditional cherry-picked list of programming language-isms, we'll start with a list of observed programming language issues and how we chose to deal with them in Mu.
- Sequence notation - The grammar of almost all programming languages represent a sequence, this makes optimization for parallelism difficult and the compiler often has to make conservative decisions regarding instruction reordering.
- File layout dependency - Some languages are tightly tied to file names and locations, any change to file layout breaks every reference to information within the file.
- Exposing compilation toolchain via API - Most compiler toolchains are exposed via commandline programs and switches. Scripts that run the compilation process need to be very careful with string manipulation and aware of platform and compiler nuances with flags and commands.
- Slow compile times - Some languages have improved this yet the two workhorses of systems and embedded devices, C/C++ have slow compile times.
- Compiler extensibility - Issues like serialization and reflection are difficult or expensive in most languages. Either they're not supported in the languages, grafted on with textual substitution tricks, or require an extensive runtime to support.
- Compile-time computations - Languages that compile are primarily set up to do computations at runtime with limited support to compute things at compile-time.
- Macros or textual substitution makes grammar difficult for humans to reason around.
- Templates or plain type-genericy solve certain problems easily but their complexity doesn't scale well. Templates have a more expressive and safe way to compute than macros but as implemented in the C++ language, they're complex and have a whole set of limitations separate from the main language.
- Build-scripts are the brute-force way to do compile-time computation, run a program to generate a text file of a program for compilation. Applicable in some situations but usually the build script is an intermediate language that's different than the main language being used.
- Languages tied to domain - There are a lot of areas of programming language research that could benefit from reusing a grammar specification to prototype new concepts, unfortunately most language syntaxes are tied to specific semantics making grammars insufficient for reuse in a new domain.
- Grammar coupled with type system - As new type systems are researched and developed, applying them in the industry in difficult if at all possible because grafting them on to existing heavily used languages is difficult.
- Build tool languages - For compiled languages, the compilation process is controlled by build script written in a different language with a whole different set of syntax/features/limitations.
- Complex to extend and modify the compiler
- Impedance between software and hardware design - Almost every software development language is not suitable for use in describing a hardware system.
- Difficult verbatim text input - In many languages inputting strings that contain special language control characters is difficult. Usually escape characters or character sequences are used to give the user easier control of input but there's usually difficulty around defining text that contains escape characters or sequences.
- Insufficient debugging tools
- Functions not allowing multiple return values
- Difficulty in including third-party libraries - Depending on the complexity of third-party library, it may or may not be feasibly to fully integrate at the source level i.e. rebuilding library from source along with your own project. Ideally one should be able to use a small identifier to reference a third-party library and all the necessary source and build functionality should get pulled in as well.
The Mu syntax at its core is a textual representation of a directed graph. This allows the compiler and optimizers to know the exact programmer-stated operation dependencies and make more aggressive optimizations.
The Mu syntax doesn't make definitions of semantics in its notation. All semantics are defined in programs that make use of the graphs the syntax defines. This maximizes the number of environments that Mu can be used in: interpreted/compiled, GC'ed/manual, object-oriented/functional, static/dynamic typing
The procedural nature of the Mu toolchain allows extensions to be added to the compilation process that allow things like serialization or reflection to be safely added to projects without requiring support from a runtime or other heavy overhead that's prohibitive in some domains.
The Mu grammar doesn't make requirements for a type system. This allows the type systems to be added or exchanged almost as easily as ordinary libraries are added.
Since Mu can be used in both a scripted and compiled context, the build tools used to run the compilation process can be themselves written in Mu giving full programming language support to compile-time computation and the code generation process.
The core Mu syntax requires very few tokens. This allows extensions of the compiler by hooking identifiers and making them keywords. Extension keywords can be change how the AST is translated in to a graph.
Mu adopts the approach used in the C++11 standard where the user can define a custom verbatim string terminator. This allows the user to always select a termination token that is not contained in the verbatim string.