-
Notifications
You must be signed in to change notification settings - Fork 24
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add post on weakrefs * . * . * . * . * . * . * . * Include copy_to_newspace function * . * . * Nearly finish post * . * . * . * .
- Loading branch information
1 parent
60120b9
commit eb76381
Showing
4 changed files
with
238 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,238 @@ | ||
--- | ||
title: Weak references and garbage collectors | ||
layout: post | ||
--- | ||
|
||
From 2018 to 2021, I worked on a greenfield Python runtime called Skybison. One | ||
of its major differences from CPython was that it used a moving garbage | ||
collector (GC). This I understood in theory---I knew that it ran when the heap | ||
filled up, knew we needed handles to update pointers in the runtime's code, had | ||
read the [Moon paper](/assets/img/moon-gc.pdf) (PDF)---but the other day, I | ||
wanted to implement weak references and couldn't immediately figure it out. | ||
Skybison thankfully has a reasonably clear implementation. So now I'm writing | ||
this post, mostly for myself, but maybe it will be useful to you as well. | ||
|
||
In this post I'll give a brief overview of a garbage collector, a sample | ||
"normal" object, and then show the special handling for weak references. I've | ||
taken inspiration from the Skybison code, but it's possible other projects have | ||
different approaches. | ||
|
||
## Garbage collectors | ||
|
||
While this post talks mostly about moving garbage collectors, I think the | ||
weakref handling applies pretty cleanly to mark-sweep and other types of | ||
stop-the-world GC as well. I don't know about reference counting or concurrent | ||
GC, though. | ||
|
||
> As an aside, if you have not been in the internals of a moving garbage | ||
> collector before, I wholeheartedly recommend [Andy Wingo's post][wingo-gc]. | ||
> He's distilled the core ideas so well. Ever read a post that's so well | ||
> written and concise that it takes your breath away? It's what brought the | ||
> Moon paper from dream land into the real world for me. | ||
[wingo-gc]: https://wingolog.org/archives/2022/12/10/a-simple-semi-space-collector | ||
|
||
The important things to know about a garbage collector for this post are: | ||
|
||
* It collects objects when they are dead | ||
* Dead means that they have no pointers pointing to them | ||
* "No pointers" means no non-weakref pointers | ||
* A weakref is itself a type of object | ||
* The weakref object might still be alive | ||
* When a weakref's referent (thing it points to) dies, the `ref.referent` field | ||
should be cleared | ||
|
||
Here is the core of Scrapscript's garbage collector. It's a very slightly | ||
modified version of Andy's semispace GC. There are two phases, marked by | ||
comments in the code: 1) scan the roots 2) incrementally copy over the object | ||
graph (indirectly) pointed to by the roots. Everything not indirectly pointed | ||
to is dead. | ||
|
||
```c | ||
void collect(struct gc_heap* heap) { | ||
flip(heap); | ||
// Scan the roots and copy them into newspace | ||
uintptr_t scan = heap->hp; | ||
for_each_root(heap, copy_to_newspace); | ||
// Now go copy the rest of the graph | ||
while (scan < heap->hp) { | ||
struct gc_obj* obj = (struct gc_obj*)scan; | ||
for_each_field(obj, heap, copy_to_newspace); | ||
scan += heap_object_size(obj); | ||
} | ||
} | ||
|
||
void for_each_field(struct gc_obj* obj, struct gc_heap* heap, VisitFn visit) { | ||
switch (obj_tag(obj)) { | ||
case TAG_LIST: | ||
visit(&((struct list*)obj)->first, heap); | ||
visit(&((struct list*)obj)->rest, heap); | ||
break; | ||
// ... | ||
default: | ||
fprintf(stderr, "unknown tag: %u\n", obj_tag(obj)); | ||
abort(); | ||
} | ||
} | ||
|
||
void copy_to_newspace(struct gc_obj** pointer, struct gc_heap* heap) { | ||
struct gc_obj* from = *pointer; | ||
*pointer = is_forwarded(from) ? forwarded(from) : copy(heap, from); | ||
} | ||
``` | ||
> As an aside, using newspace as a queue like this (and, implict here, using | ||
> forwarding pointers) is called Cheney copying after a technique described by | ||
> CJ Cheney in his [1970 paper](/assets/img/cheney.pdf) (PDF). It's not | ||
> important for this blog post, but it's worth knowing about. | ||
Now, here's the problem. Say we were to add an empty case for weakrefs in | ||
`for_each_field`. | ||
```c | ||
// ... | ||
case TAG_WEAKREF: | ||
// Don't visit the referent. | ||
break; | ||
// ... | ||
``` | ||
|
||
That's great and all---the weakref won't keep its referent alive---but there | ||
are two problems: | ||
|
||
1. If the referent dies, we have a dangling pointer into oldspace. We should | ||
instead clear the referent and have it point to some NULL-like sentinel | ||
object. | ||
1. If the referent ends up staying alive, we have a dangling pointer to a | ||
forwarded object and need to fix it up. | ||
|
||
Let's add one more step to fix both problems. | ||
|
||
## Weak references | ||
|
||
At a high level, we want to find all the still-alive weakref objects and fix up | ||
their referent pointers. If the referent is still alive, we want to update it | ||
to the updated (forwarded) pointer in newspace. If the referent is dead, we | ||
want to set the referent field to `NULL` or something. This has to happen after | ||
the main collection, since that heap root and heap traversal determines what is | ||
still alive and what is dead. | ||
|
||
We could do another full heap traversal to find all the weakrefs, but that | ||
might be slow: the heap could be arbitrarily large. In that case, one weakref | ||
at the end might incur a second full heap scan. Not great. Instead, we make | ||
weakrefs pay-as-you-go: each weakref contains a `link` field so that we can put | ||
it in a linked list in the first heap scan. Then, we'll traverse only the | ||
linked list of weakrefs to update their referents. | ||
|
||
```c | ||
struct weakref { | ||
struct gc_obj HEAD; | ||
struct gc_obj* referent; | ||
struct weakref* link; | ||
}; | ||
|
||
struct weakref* delayed_references = NULL; | ||
|
||
void enqueue_weakref(struct weakref* ref) { | ||
ref->link = delayed_references; | ||
delayed_references = ref; | ||
} | ||
|
||
struct weakref* dequeue_weakref() { | ||
struct weakref* result = delayed_references; | ||
delayed_references = result->link; | ||
result->link = NULL; | ||
return result; | ||
} | ||
|
||
bool is_weakref(struct gc_obj*); | ||
|
||
void collect(struct gc_heap* heap) { | ||
flip(heap); | ||
// Scan the roots and copy them into newspace | ||
uintptr_t scan = heap->hp; | ||
for_each_root(heap, copy_to_newspace); | ||
// Now go copy the rest of the graph | ||
while (scan < heap->hp) { | ||
struct gc_obj* obj = (struct gc_obj*)scan; | ||
if (is_weakref(obj)) { | ||
// Enqueue to linked list (new!) | ||
enqueue_weakref((struct weakref*)obj); | ||
} else { | ||
for_each_field(obj, heap, copy_to_newspace); | ||
} | ||
scan += heap_object_size(obj); | ||
} | ||
// Update or clear all referents (new!) | ||
while (delayed_references != NULL) { | ||
struct weakref* ref = dequeue_weakref(); | ||
struct gc_obj* referent = ref->referent; | ||
ref->referent = is_forwarded(referent) ? forwarded(referent) : NULL; | ||
} | ||
} | ||
``` | ||
The `link` field is otherwise completely unused during normal program | ||
operations. Its sole purpose is to be GC metadata. | ||
Let's see what this looks like with two examples: 1) a weakref whose referent | ||
dies 2) a weakref whose referent lives across a collection. | ||
We use this thing called "handles" (or shadow stack, or ...) to mark pointers | ||
as needed by C code that the garbage collector does not know about. To learn | ||
more about that, check out [my post on the Scrapscript baseline | ||
compiler](/blog/scrapscript-baseline/). Search for "handles". | ||
```c | ||
int main() { | ||
HANDLES(); | ||
struct gc_heap *heap = make_heap(1024); | ||
// left4dead_num has no handle; it will die at the call to collect(). | ||
struct gc_obj* left4dead_num = mknum(heap, 3); | ||
// keptalive_num is kept alive and the pointer is updated because we have | ||
// used a handle. | ||
GC_HANDLE(struct gc_obj*, keptalive_num, mknum(heap, 4)); | ||
// Both weakref *objects* will be kept alive beacuse of the handles. | ||
GC_HANDLE(struct gc_obj*, ref0, mkweakref(heap, left4dead_num)); | ||
GC_HANDLE(struct gc_obj*, ref1, mkweakref(heap, keptalive_num)); | ||
fprintf(stderr, "ref0 %p with referent %p\n", | ||
ref0, ((struct weakref*)ref0)->referent); | ||
fprintf(stderr, "ref1 %p with referent %p\n", | ||
ref1, ((struct weakref*)ref1)->referent); | ||
fprintf(stderr, "COLLECTING\n"); | ||
collect(heap); | ||
fprintf(stderr, "ref0 %p with referent %p\n", | ||
ref0, ((struct weakref*)ref0)->referent); | ||
fprintf(stderr, "ref1 %p with referent %p (keptalive is %p)\n", | ||
ref1, ((struct weakref*)ref1)->referent, keptalive_num); | ||
return 0; | ||
} | ||
``` | ||
|
||
Lo, ref0's referent is cleared while ref1's referent gets updated to the new | ||
pointer: | ||
|
||
```console?prompt=$ | ||
$ ./main | ||
ref0 0x771baf155020 with referent 0x771baf155000 | ||
ref1 0x771baf155038 with referent 0x771baf155010 | ||
COLLECTING | ||
ref0 0x771baf155810 with referent (nil) | ||
ref1 0x771baf155828 with referent 0x771baf155800 (keptalive is 0x771baf155800) | ||
$ | ||
``` | ||
|
||
Nice. | ||
|
||
## Wrapping up | ||
|
||
See the [full code snippet](https://github.com/tekknolagi/wingogc) (including | ||
the entire GC). | ||
|
||
## Other approaches | ||
|
||
The T3 compiler used oldspace for data structures. See [Clark's 1976 | ||
paper](/assets/img/clark.pdf) (PDF). Thanks, Taylor, for linking to this in a | ||
comment on [Andy Wingo's post][wingo-iterate]. | ||
|
||
[wingo-iterate]: https://wingolog.org/archives/2022/12/11/we-iterate-so-that-you-can-recurse |
Binary file not shown.
Binary file not shown.
Binary file not shown.