-
Notifications
You must be signed in to change notification settings - Fork 0
/
TODO
190 lines (161 loc) · 9.41 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
=========Core functionality==========
----------STAGE 1
x Flesh out lexer - actually spit out tokens that aren't just strings
x Refactor tokenizing special chars (implement/use a map? just an assoc list)
x Flesh out ast structure. Is a program just a list of function definitions?
- Did this a bit. Need to clarify how I want to handle typing, variable declarations
- Figure out how to parse function definitions, particularly the parameters and how functions should
be stored.
x Implement expect function where I can expect a token and error out if it's bad
- Add a comma token which I feel like is needed for parameter parsing
x Ability to introspect on an AST. Print outs?
- This sort of works
- make code not a giant mess
x Read program from string into file
- Remove unnecessary pointers from typedefs
x Generate Assembly for first pass
x add list_concat(list_t *l1, list_t *l2) function that will append l2 to the end of l1
- UNIT TESTING!!! for containers (list, string) and tokenizing/parsing code.
- I'd like to be able to spit out useful errors as well bc that would be easier to debug
- I am very paranoid about seg faulting and I'm sure there are a ton of bugs here
x Refactor, especially very hacked together asm generation code.
- also python script which calls gcc. find path to compiler in better way
---------STAGE 2
wow 2ez done
---------STAGE 3
x Figure out a good AST structure including binary operations.
x Figure out parsing strategy for binary operations - I don't like the whole factor/term thing, even
though that probably works. Sounds like it would be confusing when I try to add assignment or other
binops. Maybe I can define a precedence level somehow that would inform the parsing strategy.
x Bake the precedence into the parsing strategy!
---------STAGE 4
x I wonder if I can generalize all the parsing code...
x Refactor asm generation code. Simplify construction of instructions.
- add the other bin ops.
- Can I simplify my bin op generation code? its in one giant function and it works but...
---------STAGE 5
x implement a map, just container on top of a list
- Think about assignment parsing more, there has to be a way to integrate it with current parsing
sttrategy and make it a binary op.
- I have to figure out the precedence level of an assignment. I'd like to think that it's at the
top, but I'm not sure. It's also right associative.
- How do I parse the LHS? How do I know if I'm actually in an assignment stmt?
- Or maybe just make it a separate expression and cave in
- Will probably want to introduce the IR from essentials of compilation here since I now have
variables that can conflict with each other.
- Split code generation pass: one that generates IR (x86 with variables), one that takes that IR
and allocates space for each variable and returns a map, and one (or two?) more that takes the map
and the IR instruction list and outputs valid x86 complete with patched instructions. I'm like
starting to worry about how much memory compiling a large file will take.
- Like in EoC, the register allocation step can happen in the pass that takes the IR and
produces a map. For now, it will only output offsets from EBP, but later it can also do
registers and do the whole interference graph stuff.
- i really need to write unit tests lol, like for map and list and when I probably have a graph data
structure
- Need to handle environments like in lisp interpreter because variable shadowing is allowed (or
uniquification pass?)
e.g.
int main(void) {
int x = 1;
if (x) {
int x = 2;
return x;
}
return x;
}
Returns 2.
I feel like the interference graph can be created using scopes. Things need to be popped off the
stack after they leave scope (ie, their home on the register must be freed once they leave scope).
This can be combined
Is this why 9cc's IR is x86 with infinite registers? Because you can do interference graph analysis
and allocate registers, then map registers -> registers, memory locations because you know that if
they are allocated to the same register by this point they don't interfere.
Maybe i'll do it in 3 passes? ast -> x86 with vars -> x86 with infinite regs -> actual x86.
I'll use x86 with vars to construct the interference graph (which means I'll need to uniquify)
Or can I get away without uniquification? I feel like going for that feels like more optimization.
Maybe I should aim to stay true to variables being in scope within the braces they're declared.
I like the idea of parsing variables with nested maps, but how do I leverage them for code
generation? If each scoped block in the AST has its own map, then at code generation I should be
able to assume that all the variables in that scope are uniquely named and that any variable that I
find in that scope refers to the variable with the same name in the closest outer environment.
I feel like whichever way I go, I don't need to uniquify and instead I can use the nested
environment structure for liveness analysis.
OR uniquification can occur by prepending the identifier in a declaration with a unique count (which
is an invalid variable name), and looking it up in the map can start at the end instead of the
beginning of the identifier.
The liveness analysis from EoC was pretty fine grained - the exact lifetime of variables were known.
The inefficient part was spilling, so my goal is to have the same kind of register allocation with
better spilling (ie, don't globally allocate homes for the variables - free them up when the
variable dies). That would be a pretty cool optimization to have.
-Restructuring/introducing an IR
what a pain man
Have an intermediate pass in ir.c that spits out x86 with (unique? scoped?) variables. Then a second
pass in x86.c that spits out x86 list of instructions. THEN a third pass that prints out the
instructions.
BinOp instructions need to handle args differently. Or maybe not? I can just reserve eax and ecx for
general purpose use. So the statement 2 + 2; becomes:
mov 2, rax
push rax
mov 2 rax
pop rcx
add rcx, rax
and x = 2 + 2; becomes:
mov 2, rax
push rax
mov 2 rax
pop rcx
add rcx, rax
mov rax, (var x)
Ok I think that's fine. I can basically use the Essentials of Compilation IR/graph coloring strategy
too, but instead I'll need to reserve rax and rcx both and prevent them from being used as homes.
So rax and rcx can only be used for unary/binary expressions. Other registers can be used for
assignment locations. So the expr code should be fine? What changes is assignment statements, which
is totally new.
How does essentials of compilation handle sub expressions? It didn't! At least not by the time we
were generating assembly. It removed complex subexpressions before that.
Will need to figure out variable scoping asap.
Put map/string/list in utils folder. Maybe implement an environment too?
Haha memory usage is maybe hopefully okay. Worst case I can split up files.
I'm starting on outputting the first pass. It's pretty messy but I can handle declare statements
now. I'm also pretty sure I'm doing envs wrong. Each function should have a locals list and from
that I should be able to determine how to allocate homes. Maybe refactor that part so that instead
of an env, each fn_def gets a list of var_t's.
if binops all push on the stack then there's no reason to even do register allocation...
I'll put off register allocation, but do the stack allocation stuff per function/scope.
So it'll be parse -> home alloc using envs from parse -> asm generation (which should be clever-ish
about how to properly load/store data from the stack so I don't need another pass)
Ok, need to refactor asm.c since that's a major pain point. I think parsing/tokenizing is totally
fine.
Here's what I'm going to do. No IR step - instead just allocate homes right after parsing, and use
that to generate proper x86. Have an instr_m2m/instr_r2m/instr_m2r function that basically does what
patch instructions did in EoC. Hardest part rn is figuring out how to allocate homes properly based
on scope. I think when I allocate homes, it'll probably do something intrusive to the AST and like
mutate the env in each function block.
Uh oh fucked up list. says it has a length of 5 but does it really?
IMPLEMENTING PRE/POSTINC:
how am I going to tell if its a postinc operator?
is it unary? it's also an assign. Can I expand it out to be an assignment expression?
REWRITE ENV
Things to do:
- Rewrite env
- Define a context that I'll pass around
- Make functions have blocks instead of lists of statements
==========Nice to have==========
- Less questionable organization of code
- Decent error messages
- Properly reclaim used memory
==========Notes==========
What's an expression in C? I'm thinking it's one of:
- A literal
- A function call
- An identifier/variable
- A unary expression (e.g., x++, --x, *x, x->y, x.y, etc.), which is composed of an expression
and a unary op
- (3+2)++ is NOT a valid C expression - compiler says that for pre/postfix inc/dec the
expression must be an lval
- however, *(int*)(3+2) is valid, so maybe pointer deref and pre/postfix inc/dec are
fundamentally different?
- A binary operation (+, -, /, *, <<, etc), which is composed of an expression, an operator, and
another expression
How the fuck are pointers implemented? I guess at some point I'll want to do an IR, probably similar
to the one in the Racket book I did at RC which was just x86 assembly but with variable names/types.