Tokenizer for ACE editor to do syntax highlighting using an ANTLR4 lexer.
Use bower to install:
bower install --save antlr4-ace-ext
You can install ACE editor from bower, too:
bower install --save ace-builds
After ace
is loaded
<script src="bower_components/ace-builds/src-noconflict/ace.js"></script>
add scripts:
<script src="bower_components/antlr4-ace-ext/src/token-type-map.js"></script>
<script src="bower_components/antlr4-ace-ext/src/tokenizer.js"></script>
They register themselves as ACE modules ace/ext/antlr4/tokenizer
and ace/ext/antlr4/token-type-map
. You can require them in your mode:
ace.define(
'ace/mode/my-mode',
[
"require",
"exports",
"module",
"ace/ext/antlr4/tokenizer",
"ace/ext/antlr4/token-type-map"
],
function(require, exports, module) {
var createTokenTypeMap = require('ace/ext/antlr4/token-type-map').createTokenTypeMap;
var Antlr4Tokenizer = require('ace/ext/antlr4/tokenizer').Antlr4Tokenizer;
// ...
}
}
Override the getTokenizer
method of your mode class to use you custom tokenizer:
MyMode.prototype.getTokenizer = function() {
if (!this.$tokenizer) {
this.$tokenizer = new Antlr4Tokenizer(MyLanguageLexer, antlrTokenNameToAceTokenType);
}
return this.$tokenizer;
};
The Antlr4Tokenizer
constructor takes an lexer class generated by ANTLR4 and a mapping of ANTLR4 token names to ACE token types. The mapping describes which ANTLR4 token name refers to which ACE token type (see common ACE tokens).
{
"'+'": 'keyword.operator',
"'-'": 'keyword.operator',
"'return'": 'keyword.control',
"ID": 'identifier',
"INT": 'constant.numeric'
}
You can use the helper function createTokenTypeMap
to create a token type map for your Antlr4Tokenizer
:
var antlrTokenNameToAceTokenType = createTokenTypeMap({
literals: {
'keyword.operator': ['+', '-'],
'keyword.control': 'return'
},
symbols: {
'identifier': 'ID',
'constant.numeric': 'INT'
}
});
Thereby, you do not have to quote literal token names and you can map multiple token names as array to the same ACE token type.
See the browser example of the Cymbol language (Demo).
To demonstrate how to parse a programming language with syntax derived from C, we’re going to build a grammar for a language I conjured up called Cymbol. Cymbol is a simple non-object-oriented programming language that looks like C without struct s.
- Install dependencies:
npm install
- Build project:
npm run build
- Run tests:
npm test