Skip to content

Commit

Permalink
moar edits
Browse files Browse the repository at this point in the history
  • Loading branch information
bpcreech committed May 27, 2024
1 parent ee8c109 commit 176fa4c
Showing 1 changed file with 62 additions and 48 deletions.
110 changes: 62 additions & 48 deletions content/post/mini-racer-v0.12.2.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,14 @@ The dynamic language embedding
[turducken](https://en.wikipedia.org/wiki/Turducken) is complete: you can now
call Python from JavaScript from Python!

By writing logic in Python and exposing it to V8's JavaScript sandbox, you can
now in theory, while writing _only Python and JavaScript_, create your own JS
extension library similar to
[NodeJS's standard library](https://nodejs.org/docs/latest/api/).Obviously,
As of [PyMiniRacer](https://github.com/bpcreech/PyMiniRacer) `v0.12.2`, you can
write logic in Python and expose it to V8's JavaScript sandbox. You can now in
theory, while writing _only Python and JavaScript_, create your own JS extension
library similar to
[NodeJS's standard library](https://nodejs.org/docs/latest/api/). Obviously,
anything created this way this will almost certainly be less extensive, less
standardized, and less efficient than the NodeJS standard library; but it will
be _more tailored to your use case_.
standardized, and less efficient than the NodeJS standard library; but it _will_
be _more tailored to your needs_.

This post follows up on prior content related to
[reviving PyMiniRacer](https://github.com/bpcreech/PyMiniRacer) and then
Expand All @@ -42,9 +43,11 @@ by default, both simplicity and
[security](https://bpcreech.com/PyMiniRacer/architecture/#security-goals).

Thus, until now, you couldn't, say, go and grab a random URL off the Internet,
or even log to stdout (e.g., using `console.log`). Neither is bundled with V8.
or even log to stdout (e.g., using `console.log`). No APIs for either operation
are bundled with V8.

Well, now you can extend PyMiniRacer to add that functionality yourself!
Well, now you can use PyMiniRacer to extend the JavaScript API, thereby adding
that functionality yourself!

```python
import aiohttp
Expand Down Expand Up @@ -120,17 +123,17 @@ V8 runs JavaScript in a single-threaded fashion, and doing anything synchronous
in a callback to Python would block the entire V8 isolate. Worse, things would
likely deadlock very quickly—if your Python extension function tries to call
back into V8 it will freeze. The only thing we can reasonably do, when
JavaScript calls outside the V8 sandbox, is create a `Promise` and do the work
to fulfill that `Promise` out of band.
JavaScript calls outside the V8 sandbox, is create and return a `Promise`, and
then do the work actually needed to _fulfill_ that `Promise` out of band.

## Internal changes to PyMiniRacer

### Implementing `wrap_py_function`

I put some general design ideas about `wrap_py_function`
[here](https://github.com/bpcreech/PyMiniRacer/issues/39). What landed in
PyMiniRacer is basically
["Alternate implementation idea 3"](https://github.com/bpcreech/PyMiniRacer/issues/39#issuecomment-2043984826).
PyMiniRacer is basically "Alternate implementation idea 3"
[on the GitHub issue outlining this feature](https://github.com/bpcreech/PyMiniRacer/issues/39#issuecomment-2043984826).
Generally speaking:

#### Generalizing PyMiniRacer callbacks and sharing allocated objects with V8
Expand All @@ -140,7 +143,7 @@ await `Promise` objects. We just had to generalize it!

... Which specifically means allowing V8 to _reuse_ a callback. With `Promises`,
callbacks are only used 0.5 times on average (either the `resolve` or `reject`
is used exactly once). PyMiniRacer handled cleanup of these on its own; the
is used, exactly once). PyMiniRacer handled cleanup of these on its own; the
first time either `resolve` or `reject` was called, PyMiniRacer could destroy
both callbacks. This technique was borrowed from
[V8's d8 tool](https://github.com/v8/v8/blob/0f719663da59ee690f3b72c520f8f9ea2328071a/src/d8/d8.cc#L1493).
Expand All @@ -152,29 +155,30 @@ and hand that off to V8.

Unfortunately,
[V8 is pretty apathetic about telling us when it's _done with_ an external C++ object we give it](https://stackoverflow.com/questions/24107280/v8-weakcallback-never-gets-called).
I couldn't make it work at all, and all commentary I can find on the matter says
trying to get V8 `MakeWeak` and finalizers work reliably is a fool's errand.
This creates a memory management conundrum: we need to create a small bit of
per-callback C++ state and hand it to V8, but V8 won't reliably tell us when it
has finally dropped all references to that state.
I couldn't make the V8 finalizer callback work at all, and all commentary I can
find on the matter says trying to get V8 `MakeWeak` and finalizers work reliably
is a fool's errand. This creates a memory management conundrum: we need to
create a small bit of per-callback C++ state and hand it to V8, but V8 won't
reliably tell us when it has finally dropped all references to that state.

So I moved to a model of creating _one such callback object per MiniRacer
context_. We can tear _that_ object down when tearing down the whole MiniRacer
context, rather than relying on V8 to tell us when it's done with the data we
gave it. In order to multiplex many callbacks into that object, we can just give
JavaScript an ID number (because V8 _does_ manage to reliably destruct
numbers!). This new logic lives in
context_. This object outlives the `v8::Isolate`, so we're no longer relying on
V8 to tell us when it's done with the pointer we gave it. In order to multiplex
many callbacks through that one object, we can just give JavaScript an ID number
(because V8 _does_ manage to reliably destruct numbers!). This new logic lives
in
[`MiniRacer::JSCallbackMaker`](https://github.com/bpcreech/PyMiniRacer/blob/release/v0.12.2/src/v8_py_frontend/js_callback_maker.h).

Meanwhile the Python side of things can manage its own map of ID number to
callback, and remove entries from that map when it wants to. (This is what the
`wrap_py_function` context manager does on `__exit__`). Since Python is tearing
down the callback and its ID on its own schedule, in total ignorance of
JavaScript's references to that callback ID, it's possible for JS to keep trying
to call the callback after Python already tore it down. This is working as
designed; such calls can be easily spotted because the `callback_caller_id` or
`callback_id` don't reference an active `CallbackCaller` or callback (see
diagram below), and they can thus be easily ignored.
JavaScript's references to that same callback ID, it's possible for JS to keep
trying to call the callback after Python already tore it down. This is working
as designed; such calls can be easily spotted when the either
`callback_caller_id` or `callback_id` doesn't reference an active
`CallbackCaller` or callback, respectively (see diagram below). Calls to
invalidated ID numbers can be easily and safely ignored.

I think this could be turned into a generalized strategy for safely sharing
allocated C++ memory with V8:
Expand All @@ -184,14 +188,17 @@ allocated C++ memory with V8:
2. If you want to be able to delete any objects _before_ the `v8::Isolate`
exits, don't share _raw_ pointers and references. Instead:
1. On the C++ side, create an ID-number-to-pointer map, and give V8 only ID
numbers.
2. Create an API, whether in C++ or JavaScript (or Python which calls C++ in
PyMiniRacer's case) which authoratively tears down the object and the
numbers, as the `data` argument of `v8::Function::New`.
2. Your `v8::Function` callback implementation can read the ID number and
safely convert it to a C++ pointer by looking it up in the map.
3. Design an API, whether in C++ or JavaScript (or Python which calls C++ in
PyMiniRacer's case) which authoratively tears down the C++ object and the
ID-number-to-pointer map entry. (Don't rely on V8 to tell you when all
references to the object have dropped; it won't do this reliably.)
3. Be prepared for JavaScript code to try and use the ID after you've torn
down the object. The C++ code can detect this (because the ID is not in
the map), and safely reject such attempts.
4. Because we aren't tracking dangling IDs on the JavaScript side (we
can't!), be prepared for JavaScript code to try and use the ID after
you've torn down the object. The C++ code can easily detect this (because
the ID is not in the map), and safely reject such attempts.

#### System diagram

Expand Down Expand Up @@ -340,15 +347,19 @@ manage object lifecycles. As of PyMiniRacer `v0.11.1` we were using a DAG of
`std::shared_ptr` references to manage lifecycle of a dozen different classes.

I discovered [a bug](https://github.com/bpcreech/PyMiniRacer/issues/62) that
this logic left behind. The problem with the laissez-faire
`std::shared_ptr`-all-the-things memory management pattern is that we leak
memory when we have reference cycles, and unfortunately we do have reference
cycles in PyMiniRacer. In particular, we create and throw function closures onto
the `v8::Isolate` message loop which contain `shared_ptr` references to
objects... which eventually contain `shared_ptr` references to the `v8::Isolate`
itself: a reference cycle! (We also had some bare references _not_ managed by a
`shared_ptr`, which would result in a use-after-free in hard-to-reach situations
during teardown of PyMiniRacer.)
this logic left behind. If JavaScript code launched never-ending background
work, (e.g., `setTimeout(() => { while (1) {} }, 100)`), the C++
`MiniRacer::Context` and its `v8::Isolate` wouldn't actually shut down when told
to by Python side.

The problem with the laissez-faire `std::shared_ptr`-all-the-things memory
management pattern is that we leak memory when we have reference cycles.
Unfortunately we _do_ have reference cycles in PyMiniRacer. In particular, all
over PyMiniRacer's C++ codebase, we create and throw function closures onto the
`v8::Isolate` message queue which contain `shared_ptr` references to objects...
which transitively contain `shared_ptr` references to the `v8::Isolate`
itself... which contains references to everything in the message queue: a
reference cycle!

A simple way to resolve that in _another_ system might be to explicitly clear
out the `v8::Isolate`'s message queue on shutdown. Basically, by wiping out all
Expand All @@ -358,18 +369,18 @@ message queue even for teardown: we put tasks on the message queue which delete
C++ objects, because it's the easiest way to ensure they're deleted under the
`v8::Isolate` lock (discussion of that
[here](https://groups.google.com/g/v8-users/c/glG3-3pufCo)). So PyMiniRacer
can't simply clear out the message, or it will leak C++ objects on exit.
can't simply clear out the message queue, or it will leak C++ objects on exit.

After playing with this some and failing to get the lifecycles just right, and
struggling to even understand the effective teardown order of dozens of
different reference-counted objects, I realized we could switch to a different,
easier-to-think-about pattern: every object putting tasks onto the `v8::Isolate`
simply needs to ensure those tasks complete before it _anything it puts into
those tasks_ is torn down.
simply needs to ensure those tasks complete before _anything it puts into those
tasks_ is torn down.

I'll restate that rule: _if you call `MiniRacer::IsolateManager::Run(xyz)`, you
had better ensure that task is done **before** you destruct any objects you
bound into the function closure `xyz`._
are reponsible for ensuring that task is done **before** any objects you bound
into the function closure `xyz` are destroyed._

This rule seems obvious in restrospect! But it's hard to implement. I:

Expand All @@ -392,3 +403,6 @@ This rule seems obvious in restrospect! But it's hard to implement. I:
[`IsolateObjectCollector`](https://github.com/bpcreech/PyMiniRacer/blob/99591c962b219251d309e0f5899b3cf2a676ab9e/src/v8_py_frontend/isolate_object_collector.h#L13)
to ensure all the garbage is gone before we move on to tearing down the
`IsolateManager` and its message pump loop.

This is a little more code than we had before, but things are much simpler to
reason about now!

0 comments on commit 176fa4c

Please sign in to comment.