-
Notifications
You must be signed in to change notification settings - Fork 0
/
DIARY
299 lines (244 loc) · 24.7 KB
/
DIARY
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
01/07/2020
TASK: Add support for structs.
* ...
01/08/2020
Today, I continued working on adding support for structs. This involves
(1) declaring the STRUCT token in the parser and adding a rule to the lexer to tokenize `struct' into STRUCT,
(2) adding productions to the `decl_specs' non-terminal in the parser's grammar file so that it recognizes `struct' type specifiers,
(3) implementing some kind of scoping mechanism for struct declaration,
(4) adding type checking functions to ensure structs aren't redefined and that they are defined when used, and finally
(5) generate code for reserving storage for structs and accessing struct members.
In the process of doing this, I started to enounter issues with
TASK: Continue adding support for structs.
TASK: Improve handling of declaration specifiers.
C declaration specifiers come in a few different types: type specifiers (`void', `char', `int', etc.), storage class specifiers (`extern', `static', etc.), and type qualifiers (`const', `volatile', etc.). Currently, zc only supports most type specifiers but no storage class specifiers or type qualifiers.
@@@@@@@@@@@@@@@
[1 hr] Set up basic testing infrastructure.
I copied the "runner.sh" script from Professor Linderman's midd-cool-osx GitHub repository and made a few changes to make it work better with zc's semant-main and cgen-main compiler drivers. I also created a new "tst/semant" directory and placed some simple tests in there. More complex ones are to come.
[1 hr] Wrote struct tests and fixed struct type-checking bugs.
More bugs that I haven't fixed: structs definitions without accompanying variable declarations (e.g. `struct coords { int x; int y; };' ) never makes it past the parser. I'll fix this bug tomorrow.
01/09/2020
[1 hr] I thought about how to approach the bug mentioned in the previous entry, i.e. when the struct definition
struct coords {
int x;
int y;
};
is made without any accompanying variable declaraiton, the parser simply drops the struct definition because there are no variables to declare. One solution that I chose not to pursue would be to
add ``bound types'', each of which would inherit from a respective ``unbound type'' and implement a ``declarable'' interface. Then, the result of a declaration would be a list of types, some of which
would be declarable. While this wouldn't help the above problem much, it feels like a more satisfactory approach than just checking if a type's ``sym'' field is null to determine whether it's an
abstract or bounded type.
Instead, I implemented more of a quick fix: if a type specifier accompanies no declarators, the declaration list simply contains one element, the type specifier. This way the parser always retains
the struct definition.
I also wrote some basic struct declaration tests for the semantic analyzer. It passes those.
[1 hr]
I implemented code generation for structs and struct member access and wrote a couple tests; it appears to be working. I also fixed some semantic analysis bugs along the way.
I expect there a few bugs related to declaration (including struct declaration) semantics currently:
[ ] If a declaration is simply an unbound type (e.g. in the above ``struct coords'' example), the compiler currently still allocates storage for it. This is due to a quirk in how bound declarations
and unbound types are (or, rather, aren't) distinguished. There's got to be a better way to distinguish bound and unbound types, but I can't haven't found one yet.
And anticipated struct codegen bugs:
[ ] compiler aborts if one struct is (validly) assigned to another of the same type. In the future, this kind of assignment should result in a call to memcpy().
Next, I'll work on ... the sizeof operator!
[2 hrs]
I added the ``sizeof'' operator. ``sizeof'' is a compile-time operator that accepts two kinds of arguments -- types and expressions -- and returns the number of bytes the argument requires
for storage. I added the syntactic rules for ``sizeof'' in the parser grammer, added the ``SIZEOF'' token to the lexer, and created a new expression class, ``SizeofExpr''.
``SizeofExpr'' has a variant member that holds an ASTType or another ASTExpr. During semantic analysis, both the ASTType and ASTExpr are type-checked (in the pedantic case, a struct may be defined
within the type!), but during codegen, the ``sizeof'' expression collapses to an immediate value (no code is generated for the possible subexpression).
Next, I'll work on ... adding the ``->'' operator.
[2 hrs]
Adding the arrow operator was relatively straightforward: all it required was adding the applicable lexer/parser rules. It didn't require any extra work on the semantic analyzer or code generator, since I had already implemented '.' semantics, and
coords->x
is semantically equivalent to
(*coords).x
Accordingly, the new rule in the parser simply takes the left-hand expression, wraps it in a dereference unary operation, and then creates a ``MembExpr'' object.
Next, I wrote a couple tests for the arrow operator, which is when I started to encounter strange behavior. When I ran a simple test on semant-main, it correctly did not report any errors; however, when I ran the same test on cgen-main, it did report errors (but only when the test was on stdin, rather than passed as a command-line argument). I figured out I was referencing the wrong variable in the StructType::EnscopeStruct function, and I fixed the issue. (This didn't explain the difference in behavior, however.)
Next, I'll work on adding arrays.
01/10/2020
[4 hrs] I added partial support for arrays. What _does_ work, confirmed by tests:
(i) declaring arrays with an explicit, contant-expression size, like ```char buf[100];```
or ```char buf[100+20-30/4%2+3^2&1];```
(ii) indexing arrays as lvalues and rvalues, like ```
int a;
a = buf[2];```
or ```buf[a] = a;```
What should work, but hasn't been tested:
(i) implicitly casting an array to pointer, like ```
char *ptr;
ptr = buf;```
(ii) indexing pointers, like ```
ptr[2] = 'c';```
What doesn't work:
(i) Variable-length arrays (this is an optional C feature, anyway)
(ii) Initializing an array (or any variable, for that matter)
(iii) Declaring an array without specifying the size, as in `int main(int argc, char *argv[])`.
[3 hrs] I added unions. This wasn't too difficult since their grammar is identical to structs. I created a new abstract type 'TaggedType', from which both StructType and UnionType inherit. I moved most of the old StructType code over to TaggedType; the only significant different between StructType and UnionType is how they assign offsets to members: the offset of the _n_th member in a struct is the offset of the (_n_-1)th member plus the size of that member in bytes (or 0 if _n_ = 0); the offset of the _n_th member in a union is just 0.
[2 hrs] Implemented GOTO.
01/11/2020
I spent today working on adding enums. Enums are a 3rd (and last) kind of ``tagged type'' (along with structs and unions). Tagged types are qualified with a 'struct', 'union', or 'enum' and followed by an optional identifier. (Tagged types without tags are called anonymous.) Because of the similarities between how structs, unions, and enums behave, I created a new base class for these three types, 'TaggedType'. The TaggedType class has a 'tag()' property and it specifies an interface for tagged types:
- TagKind tag_kind(): what kind of tag this is (TAG_STRUCT, TAG_UNION, or TAG_ENUM)
- bool is_complete(): whether this tagged type carries a complete definition
- void complete(const TaggedType *other): complete this type's definition given a complete instance of this type
An auxiliary template class 'template <typename Memb> TaggedType_aux' inherits from TaggedType and implements TaggedType's interface. This auxiliary class defines a member 'membs', an pointer to an unordered map from Symbol * (which are the member identifiers) to Memb *, the template parameter. complete()'ing the definition of a type simply involves copying the member map pointer from the given complete type.
Further, the new 'CompoundType' class inherits from 'TaggedType_aux<ASTType>', representing types that hold multiple other types, namely structs and unions. This defines its own small interface, specifically 'offset()', which finds the offset of a given member ID within the compound type.
The enum type EnumType inherits from TaggedType_aux<Enumerator>, where 'Enumerator' is a new AST node representing a single enumerator declaration in an enum.
01/12/2020
[1 hr] I fixed a bug relating to the scoped redeclaration of scoped tagged types. The compiler was overly permissive when a tagged type, e.g. struct, was redefined and then an expression of that new tagged type could be assigned to an lvalue of the previous definition of the tagged type. I fixed this by adding a unique identifier to each class of declarations of the same tagged type. That is, in the following code snippet
[1] struct coords;
[2] struct coords { int x,y; }
int main(void) {
[3] struct coords;
[4] struct coords { int x,y; }
}
structs on [1], [2], [3] share the same unique ID, while [4] has a different unique ID. This resolved the bug, and I confirmed it with tests.
01/13/2020
[6 hrs] ASTType -> {Var,Type}Decl
01/14/2020
[1 hr] I added pre/post inc/dec expressions. Instead of creating a new expression class, I added four new 'kinds' to the UnaryExpr class, UOP_{INC,DEC}_{PRE,POST}. I wrote some tests to confirm
that these operators are working. Interestingly, there is no penalty for pre- vs. post-increment/decrement on chars (single-byte values); however, there _is_ a penalty for post-inc'ing/dec'ing
ints.
[6 hrs] Added typedefs. TODO
[1 hr] Added for loop.
01/15/2020
[1 hr] Wrote tests for loop; fixed binary operation bug (invalid 'case' fallthrough in switch statement, leading to incorrect type being assigned). Added comma operator (the one that separates
expressions).
01/16/2020
[1 hr] Wrote script that searches through the TI 84+ CE C runtime (CRT) for given symbols (e.g. sprintf, imulu, etc.) and then prints out the disassembled contents. This script was tricky because the CRT functions use a trampoline (for stability across releases) -- that is, the static address `__sprintf' points to a single jump instruction that jumps to the real implemenation of _sprintf.
[1 hr] I wrote a new C runtime function 'cdivu' that divides two unsigned chars. (The CRT already on the calculator did not have this, and fast division is non-trivial). I unit-tested this, and it works.
[2 hr] Wrote program 'memcpy.c' to demonstrate current state of compiler. It contains three functions:
```
int main(int argc, char **argv); /* main */
void *mymemset(void *dst, char c, size_t bytes); /* re-implementation of C memset function */
void *mymemcpy(void *dst, void *src, size_t bytes); /* re-implementation of C memcpy function */
```
At first, the program wouldn't compile; later, it wouldn't produce correct output and/or crash. I went back and fixed those bugs. This program behave correctly now.
memcpy.c compiles to approximately 170 lines of code (!!!). This should be a very easy to target for optimization.
[2 hr] Added new pass to compiler: AST optimization. This pass currently looks for constant expressions/statements and replaces them with literals or simplifies statements. For example, consider the
following for loop:
for (i = 0; 1; ++i)
if (i % 7 == 1 && i % 13 == 2)
break;
The loop condition `1` (i.e. true) is constant, so this for loop is transformed into a basic loop like so:
i = 0;
loop:
if (i % 7 == 1 && i % 13 == 2)
break;
++i;
goto loop;
01/17/2020
Didn't work much on compiler -- was reading papers as a post-interview 'homework assignment' for University of Michigan (had to send a very specific research question after reading 20 or so papers).
01/18/2020
Read more papers and brainstormed for my research question to send to U Mich professor (Daniel Genkin).
[2 hr] Tried to write code for peephole optimization; ended up revising it later. It is challenging to find a good way to represent instructions patterns vs. instruction instances. For example,
one peephole optimization that I would like to implement is to replace
```
lea rr1,ix+*
pop rr2
ld (rr1),rr2
```
with
```
pop rr2
ld (ix+*),rr2
```
(The asterisk '*' represents a particular 8-bit signed integer; 'rr1' and 'rr2' represent two distinct multibyte registers.)
Two instances of this instruction pattern might be
```
lea hl,ix+3
pop de
ld (hl),de
```
and
```
pop de
ld (ix+3),de
```
To make instruction pattern matching more challenging, instruction operands are not represented as a flat hierarchy; more complex operands contain suboperands.
The basic kinds of operands (or `Value`, as the class is named in `asm/asm-val.hpp`) are immediate values (`ImmediateValue`), register values (`RegisterValue`), and label values (`LabelValue`).
The complex operands are indexed register values (`IndexedRegisterValue`), which contain a register value and 8-bit index; and memory values (`MemoryValue`), which have an address that is in turn
a kind of value. In the above example, `(ix+3)` is a memory value whose address is an indexed register value, whose base value is the register value `ix`.
Pattern matching complex values becomes non-trivial: the most obvious approach would be to `dynamic_cast` each operand to the derived class that the peephole optimization requires it to be, and if
the downcast fails, move on to the next window. Otherwise, downcast any children contained within the operand (if it is a complex operand), until all required operands are verified.
Any free registers (e.g. the first occurrence of `rr1` in the above example) are bound to whatever register is found in the instruction instance, and further references to `rr1` must be replaced
with the matched register.
The logic for this approach becomes exceedingly complex, however: my original implementation for the aforementioned peephole optimization took at least 100 lines (but it worked).
01/19/2020
[2 hrs] More work on peephole optimization. Made a couple attempts that I later discarded.
01/20/2020
[5 hrs] Finally came up with an infrastructure for peephole optimization that I found satisfactory (and slightly beautiful). First, I created a new templated container type called a `zc::portal` (I know; it's a stupid name -- I just didn't want to spend any more time on what to name it...). It behaves much like `std::optional`, however instead of containing a value `T val` or nothing, a `zc::portal` contains a value `T val` or a pointer to a value `T *ptr`. I then wrapped all existing member fields of the `Value` class and all of its derived classes with the `zc::portal` container. This effectively allowed for the possibility of creating _instruction patterns_. Regular instructions would still be constructed with direct values (e.g. an `ImmediateValue`'s `zc::portal<intmax_t> imm_` field would contain an `intmax_t` integer. Instruction patterns, in contrast, would be constructed with value pointers, which tell any matching function _where_ to store the corresponding matched value.
For example, to encode the instruction operand `ix+3`, one would write `IndexedRegisterValue(rv_ix, 3)`. In contrast, to encode the instruction pattern `ix+*`, one would write `IndexedRegisterValue(rv_ix, &index)`.
Next, I wrote an instruction matching function `Instruction::Match`, which matches instruction patterns to concrete instructions. Two instructions match iff they are of the same type (e.g. both are a `ld` instruction) and have matching operands. Two operands match iff (1) they are of the same type (e.g. both are `IndexedRegisterValue`s) and (2) their members match. Their members match iff (i) the pattern's member is a pointer, in which case the instance's member value is assigned through that pointer, or (ii) the pattern's member is a direct value and is equal to the instance's member.
This made the instruction pattern matching required for peephole optimizations an order of magnitude easier, since now I don't have to traverse the operand hierarchy to extract a single register value.
01/21/2020
I spent most of the day preparing for a grad school interview with U Michigan, so I didn't get much work done.
[2 hrs] I worked on the high-level design register allocation algorithm. For the pseudocode, see `RALLOC.txt`.
This algorithm uses two ways to spill variables: on the frame (called *frame-spilling*) or on the stack (called *stack-spilling*).
Frame-spilling involves reserving extra space in the function's activation record, in the same area where it stores locals declared in the C program.
Frame-spilled variables are accessed the same way as arguments and locals, indirectly through the frame pointer (which is an index register).
Stack-spilling involves pushing the value onto the stack when it's generated and popping it of when it is used.
Stack spilling has better space- and time-performance over frame spilling. With respect to space, loading/storing a frame-spilled value requires 3-4 bytes, while loading/storing a stack-spilled value requires 1 byte.
Stack spilling has more restrictions with frame-spilling, however. It is hard to stack-spill single byte values (except for the accumulator, `a`), while values of any size can be easily frame-spilled. Furthermore, stack-spills must be fully nested: a value can only be stack-spilled if all other stack-spilled variables during its lifetime are generated _before_ and used _after_ the one in question.
01/22/2020
Since the register allocation is mostly complete, I moved on to building up necessary infrastructure to required for implementing the algorithm.
[1 hr] I wrote a new register allocator class. It currently allocates registers for a single basic block at a time. In the future, register allocation could be performed more globally, which would likely produce better results. However, for now, I'll stick with basic block register allocation because it's simpler.
[1 hr] I wrote a few other new structures: `struct RallocInterval`, which stores a variable lifetime in terms of instruction offsets within the working basic block; `class RallocVarInfo`, which stores information about a variable that needs to be allocated a register; `class RegFreeIntervals`, which keeps track of which registers are free at which instructions.
[1 hr] I implemented the `RegisterAllocator::ComputeRegFree()` function, which computes the intervals on which particular registers are free.
[1 hr] I implemented the `RegisterAllocator::ComputeVarLifetime()` function, which computes the lifetime interval of all variables present in the basic block.
[1 hr] Next, I started working on modifying the existing stack-machine code generator to produce unbound variable values as necessary instead of pushing intermediate results onto the stack.
01/23/2020
Today I worked on more register allocation infrastructure.
[1 hr] I added a basic algorithm to compute register free intervals, on a block-by-block basis. How it works: for each instruction, if the source is a register, then mark that register as ``free''. If the destination is a register, then mark the register as ``used''. This is a very flawed algorithm, which is why I replaced it in a couple days: (1) it wrongly assumes that a value in a register will be used exactly once, and (2) it doesn't account for accumulator-style instructions that use both registers (e.g. `add hl,de`).
[1 hr] I wrote a basic algorithm to compute variable lifetime intervals, on a block-by-block basic. How it works: for each instruction, if the source is a variable, mark as a `use`. If the destination is a variable, mark as a `gen` (generated). The variable's lifetime is then from the `gen` to the last `use`. This algorithm is less flawed than the one above, but still flawed.
[4 hrs] The next step was to modify the code generator to produce code that uses intermediate variables. Currently, it was a stack-machine -- it pushed intermediate values onto the stack and popped them off whenever they needed to be used. While simple, this is mutually exclusive with register allocation (stack machines don't use intermediate values).
Transitioning the code generator to the new variable scheme involved modifying many functions within the 1600-line `cgen.cpp`, so it took quite a while. I added one paramater to `ASTExpr`s' code generation method: a `VariableValue` that specifies what variable the result should be assigned to. This essentially glues expressions together. One example of the changes I made to the code generator is in the assignment expression: first, the left-hand side is generated into address variable 1; then, the right-hand side is generated into rvalue variable 2. Address variable 1 is then loaded into the multibyte accumulator register `hl`, which can indirectly store variable 2 using the instruction `ld (hl),<variable 2>`.
01/24/2020
More register allocation stuff. I had an interview with UCSB in the morning, so I only worked in the afternoon.
I rewrote existing code for computing intervals for register free times and variable lifetimes. This was partially built upon the previous algorithms I wrote yesterday, but is much more rigorous.
[1 hr] First, I added a new method to the polymorphic `Value` class: `Gen()` and `Use()`. These methods precisely specify which variables or registers are `gen`'ed or `use`d in the value. For example,
there are two values in the expression `ld (hl),v1` -- a memory value whose address is a register value, `(hl)`, and a variable value, `v1`.
`(hl)::Gen()` returns nothing, since no new values are being generated into a variable or register (only into memory). `(hl)::Use()` returns `hl`, since `hl` is used as the address of the store.
`v1::Gen()` is also empty, but `v1::Use()` is `v1` itself.
[3 hrs] The new algorithm for copmuting intervals is based on the `Value::Gen()` and `Value::Use()` methods.
- Register free intervals: These are computed backwards. A register is _free_ (i) from its last `use` to the end of the block, (ii) from its last `use` before a `gen` within a block, and (iii) from its first `gen` to the beginning of the block.
- Variable lifetime intervals: These are computed forwards. A variable's lifetime spans from the first `gen` to the last `use`. There is other informtation that is also captured in a `VariableRallocInfo` struct that will be used later on.
01/25/2020
[2 hrs] Continued working on register allocaiton subroutines. Currently checks tries to assign all variables to registers; doesn't do much beyond that.
Confirmed success with `simple.c` -- optimally assigns registers.
Definitely doesn't work with `memcpy.c`, though.
[3 hrs] Register-only allocation mostly working. Haven't tested stack-spilling yet.
[1 hr] Added Resolve() for instructions. <EXPLAIN>.
[2 hrs] Debugging; debugging the debugger.
01/26/2020
[? hrs] Added frame-spilling.
[? hrs] Fix stack-spilling bug when values are used multiple times.
[1 hrs] Transition to use `FrameValue`s.
[2 hrs] Reconfigured FrameValue until it worked.
01/27/2020
[1 hr] Fixed peephole optimization bug (would skip duplicate matches, e.g. `ld a,a \ ld a,a` and only remove first.
[2 hrs] Fixed bugs with `chars.c` program -- values were being used across blocks, which caused ralloc assertions to fail.
Fixed bug with `ReturnStat`: it didn't cast the return result if it wasn't of equal type. Added explicit casts to fix this, and now `chars.c` runs correctly.
[1 hr] Add ByteValue type.
[? hrs] CASTING.
01/28/2020
[1 hr] Fixed casting bug. Finished adding ByteValue type.
01/29/2020
[2 hr] Debugged segfault in CastExpr with `chars.c` test program.
[1 hr] Added optimization flags.
Ideas: add `bool` type? bool would be stored as flag.
[ ] IfStat -- cast cond to bool
[ ] WhileStat -- cast cond to bool
[ ] ForStat
OPTIMIZATION IDEAS
[ ] Defragmenting during register allocation
[ ] Pushing casts down AST exprs, when possible (e.g. addition/subtraction/multiplication,
but not division)
PEEPHOLE OPTIMIZATIONS
Done
[X] push hl/de \ pop de/hl -> ex de,hl
[X] push rr \ pop rr -> (delete)
[X] lea rr1,ix+* \ pop rr2 \ ld (rr1), rr2 | ld rr2,(rr1) -> pop rr2 \ ld (ix+*),rr2 | ld rr2,(ix+*)
Todo
[ ] lea rr,ix+* \ push rr -> pea ix+*
[ ] ld rr,LL \ push rr \ pop iy \ call _indcall -> call LL
FUNCTION-LEVEL RALLOC
- Serialize blocks into single instruction stream.
- Perform ralloc on that.
- All variable splitting should occur BEFORE serializing block diagram to instruction stream.