Skip to content

regular language tools - automata-based tokenizer, LL(1) parser

Notifications You must be signed in to change notification settings

zambonin/rltools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UFSC/CTC/INE
INE5421 - Formal Languages and Compilers (2015/2)
Gustavo Zambonin (13104307) & Matheus Ben-Hur de Melo Leite (13100765)

Practice I - Algorithms for Manipulation of Regular Languages

The language chosen to code these algorithms was Python, for it is easier to
manipulate object types dynamically. Almost all the information related to the
program's workings may be found through its manpage, with the command

    man ./rltools

where it is located (inside this folder). Invalid arguments will be rejected by
the program when possible.

A semi-automated test file was placed on the associated folder for these
operations, making it easier to claim the consistency of the I/O operations.
Powered by Bash scripting, it executes determinizations and conversions from
and to finite automata, regular grammars and regular expressions consecutively
using the same outputs it created earlier in a cyclic manner, asserting the
files' contents regularity.

    sh autotester.sh <non-deterministic finite automaton input file>

Finally, some design rules for an input automaton file are described below.
Grammar and expression files' structure is alike. Examples for those can be
generated using the above script.

* epsilon-moves must be added only to the required states;
* a state must consist of all the transitions through letters of the alphabet,
even if those are empty.

The regular expression parser accepts only unary letters alphabets. Hence, an
expression of the form "(q0|q1*)" will be parsed with the set {'q', '0', '1'}
as its symbols, and won't match a possible desired {'q0', 'q1'} set.
Parenthesis should be used whenever possible when describing regular
expressions. The code itself should be executed using Python 3.x.

                                                                    02/10/2015

Original assignment: e65b8a1b03c7be85279dcd6236d3064d892b6ffd

Practice II - Lexical Analysis

The lexical structure [1], specifically built for this assignment, is presented
below, with some remarks about its inner workings. It can be used as a guide to
write possible "programs" with this proto-language, albeit syntactically and
semantically incorrect. Remember that the input file must end with a new line,
and separators are essential to the scanning of the tokens (1+1 != 1 + 1).

    keyword      ::= 'else' | 'if' | 'while' | 'read' | 'write' | 'list' |
                     'bool' | 'str' | 'int' | boolean
    boolean      ::= 'False' | 'True'
    identifier   ::= letter (letter | digit | '_')* [2]
    letter       ::= lowercase | uppercase
    lowercase    ::= 'a' | 'b' | ... | 'z'
    uppercase    ::= 'A' | 'B' | ... | 'Z'
    digit        ::= '0' | '1' | ... | '9'
    nonzerodigit ::= '1' | ... | '9'
    char         ::= any ASCII character between 33 and 126, with the
                     exception of 34, 40, 41, 42, 92 and 124 [3]
    string       ::= '"'char*'"'
    integer      ::= nonzerodigit digit* | '0'
    arit_op      ::= '+' | '-' | '×' | '/'
    logic_op     ::= 'and' | 'or' | 'not'
    comp_op      ::= '<' | '>' | '==' | '>=' | '<=' | '!='
    attr_op      ::= '=' | '->' | ':='

[1] Loosely based on:
https://inst.eecs.berkeley.edu/~cs164/fa11/python-grammar.html

[2] It must not match a keyword.

[3] More specifically, the ASCII indexes noted are in base 10. The list
starts at ! and ends with ~, excluding ", (, ), *, \ and |. Note that a string
must be enclosed within double quotes. More information about this character
set can be found on its manpage using the command

    man ascii

on a Unix-based OS.

                                                                    08/11/2015

Original assignment: 155d45589027987b6e3b840f9799918eda49f1f5

Practice III - Parsing

An LL(1) parser was created using a grammar constructed manually through
first/follow sets. The documents describing that process are stored inside
the `docs/` folder. It derives its productions according to the resulting
parsing table, programmed as a Python executable.

                                                                    30/11/2015

Original assignment: 22d84ed0d365fff847148db1256eb4d1b5afbd38

Releases

No releases published

Packages

No packages published